Steve Lohr has written a strong piece for the NY Times on the "age of big data". I sent it to my UCLA Freshmen to read. We will see if they bother. I want to make a distinction here between 1. crunching data from an existing data set , 2. engaging in an explicit field experiment in mid-stream. I also want to talk about heterogeneity.
Permit me to talk about a hypothetical example of data from an electric utility.
Your electric utility provides electricity to everyone (residential, commercial and industrial customers) in its service zone. If you receive a monthly bill, it has this data (kWh and the total $ bill you will be charged). It knows your street address.
In this age of "big data", this leads to a big data set. What can be done with these data? Since the utility knows your street address, it can purchase and merge in other data so that it has some basic demographic data about you and about the home you live in. For example, from the Los Angeles County office --- you can purchase data on each home in LA with regards to its physical attributes. Such data could be merged by street address to the electric utility's electricity consumption data.
Once, you have created this "big data merge", you can use standard statistical methods to test a variety of hypotheses concerning who consumes a lot of electricity. To give you sense of this type of research, you can read my paper with Dora Costa posted here.
Now, let's pivot and talk about #2 above. Once the "big data" research effort has baseline data about different individuals, it can then engage in a randomization of a new incentive. For example, a supermarket may offer a targeted price discount of coffee to all people who live in zip code 90024. Such an offer may last for 1 month and offer a random subset of households who live in zip code 90024 a 30% discount. For the economists out there, note that this provides two points on such households' demand curve. If the supermarket can use such field experiments to sketch out different coffee consumers' demand curves for coffee then it can use this information in figuring out the best pricing strategy for maximizing its profit. This is an example of how "big data" combined with randomized field experiments leads to better corporate decision making.
At UCLA's Institute of the Environment, Magali Delmas and I are looking for more corporate partners who are willing and eager to explore the demand for "green products" and to use the combination of "big data" and field experiments to learn. You know where to find us!