At LotaData, I grew my skills across the board in a rapid, positive start-up environment. The company focused on gleaning insights from mobile device geo-location signals. By mapping mobile device location to actual places visited and events attended, I built real world profiles for millions of devices.
Primarily, I was in charge of data science projects determining individual profiles creation and inferring visits. Additionally, I maintained the NO-SQL database, created schemas for internal data indexing, produced API endpoints, and helped design the customer-facing geo-dashboard. I ran client meetings to determine necessary features and gauge market interest. I created data visualizations for internal use, VC presentations and client outreach. And in my data journalist role, I wrote several articles embedded with my own data visualizations (a few of which were published by marketing and govtech sites).
On the data science side, I applied clustering algorithms to determine points of interest, probable work areas and probable home areas. I created device networks based on vectorized internal classifications of places, events and neighborhoods. I then used graph analysis on these large networks to infer behavioral segments and classify similar devices.
At Carbon Lighthouse (an awesome company that performs energy efficiency projects on large buildings), I was given the opportunity to delve into building energy usage data. My goal was to help CL ensure that it was delivering true energy savings to its customers. To accomplish this, I built a Measurement & Verification tool in R for the engineering team that made predictions based on multilinear regression models.
I performed extensive factor analysis and used stepwise Akaike Information Criterion (AIC) regression to find predictive models that minimized complexity while explaining much of the variance of building energy use. Next, I used bootstrapping on the root mean squared error (RMSE) and R-squared values of the models to ensure autocorrelation in the data did not affect the validity of the models. I also tested for overfitting of the model by sampling various subsets of training data, predicting onto remaining test data and comparing RMSE values.
Holistically, the tool was built to be flexible enough for the engineering team to input building specific values (bldg. schedule, weather, building changes, etc.), to build their own models (and test them with the methods from the previous paragraph), and to receive graphical and statistical outputs at the end. These outputs let the team know whether the building overperformed, underperformed or performed exactly as expected one year after the implementation of an energy efficiency project.
I spent this summer as the inaugural intern for the Open Innovation Center, a branch of Samsung that aims to work with Silicon Valley start-ups through partnerships, mergers, acquisitions or incorporation into the Samsung Start-up Accelerator. In addition to meeting with several start-ups, I was tasked with recommending a strategy for Samsung in the mobile messaging app space. In order to evaluate the mobile messaging app companies rigorously, I collected as much data as I could on approximately 30 such companies.
The data varied from number of users to years since inception to money raised to other details. With this categorical and continuous data, I ran Multiple Correspondence Analysis (MCA) to understand what combination of factors were most important, and calculated MCA scores in order to rank the various companies. After ranking the apps in this manner, I grouped the apps according to their scores and was able to make inferences about each group.
In the end, I chose to present two recommendations: one from the well established group with tens of millions of users and one from the early stage start-up group that showed promising features and quick user adoption. The first recommendation was to acquire WhatsApp (Facebook did so about 6 months later) and the second was to acquire MightyText, an app with a sleek interface and solid cross-platform integrability that has continued to show promise.
My two main summer projects dealt separately with profitability and environmental initiatives for the Texas electricity provider TXU Energy. In my first project, I built a scenario analysis tool in Excel to gauge the profitability of swapping customers from obsolete electricity plans to current plans. In the tool, customer retention rate, expected natural gas prices, and expected electricty usage (among other variables) could be modified to reflect various conditions.
In my second project, I worked to help renew TXU's partnership with SolarCity, a partnership aimed at getting TXU customers to install solar panels. My role included colloboration on website design and using Survey Monkey on a test group to help determine which wording emphases could most effectively promote the program. I wrote material emphasizing different motivations, ranging from saving the environment to reducing one's carbon footprint to saving money in the long run. In the end, the message which resounded with the test group most (i.e. in a statistically significant manner) focused on saving money, whereas the message with the least effect, unsurprisingly, focused on carbon footprints.