Although data collection can be considered an infringement on people’s privacy, sharing your data can help society create better modeling tools and algorithms to identify early warning signs of major issues such as poverty, homelessness, and cancer.
Data collection has been around for centuries and can be more than just sending cookies from your phone or laptop. In 1790, when the US Census was first implemented, census takers walked from house-to-house to survey and collect data. Since then, American households have filled out the census every decade and also have taken part in the American Community Survey (ACS) every year. Given its frequency, the ACS helps determine how the US government distributes $675 billion in funds.
The importance of the Decennial US Census cannot be overstated. The census allows the government to collect important demographic data, which is then used by lawmakers to aptly distribute federal and state funds. Businesses also use census data to target prime locations for development, and can identify dilapidated neighborhoods and build factories, offices, and stores to create jobs.
However, since the census is only once a decade, it cannot properly account for recent demographic changes if the end of the decade is approaching. The ACS combats this problem by collecting information on 46 topics from a smaller sample of households deemed to be representative of the country. Similarly to the census, lawmakers may use the ACS to identify demographic changes and target reforms and appropriation packages.
Many people, however, do not realize that census data is quite comprehensive and easily accessible by the public. Websites offer interactive and simple ways to visualize the data and offer users the opportunity to view over 500,000 data points from all Decennial Censuses, American Community Surveys, and other datasets such as the annual FBI Crime Data report and tallied votes from American Presidential Elections.
Honestly, the specificity of the data can be simultaneously quite fascinating and terrifying. Using SocialExplorer.com, I searched for my home address on the 2010 Census and found the specific data for my family. I was able to check my family’s race and my parents’ marital status, employment sector, and income bracket.
Perhaps most importantly, census takers know exactly what is asked of them and what information will be collected by the government.
However, the online collection of data tends to be sneakier and privacy policies more ambiguous. Real dangers exist, and it’s often young internet users who are most at risk and least aware. The solution would be greater transparency between companies and their users. Prominent companies have been involved in large scandals such as Facebook giving away its users’ data. Greater transparency about what data will be collected would allow users to become more aware and better protect themselves from such scandals in the future.
Nowadays, data collection isn’t limited to government surveys, but also includes forms of data collection employed by tech giants like Google and Facebook. Google tracks a host of information ranging from what you search to what ads or links you click. Google Maps, Flights, Books, Translate, and Gmail also keep records of what you search in order to build your profile.
Unfortunately, all of this collected information warrants major privacy concerns. Hackers can access and release personal information, and tech companies can effectively profile anyone. While attending a data analytics class over the summer, I met a kid named Yash from Silicon Valley. Ironically, we were learning to analyze data, yet he kept his online presence as minimal as possible by using VPNs, erasing his history, and avoiding social media sites. He was paranoid that he could be held accountable for a mistake online in his future, so he used his laptop as cautiously as possible.
However, I believe that Yash was overly cautious. Despite privacy concerns, data collection also drives innovation through machine learning. Larger, more accurate datasets can be used for complex modeling and predicting, and the applications for better predictions are endless. The analysis and modeling of big data can be used to best distribute social services by identifying early warning signs. For example, lawmakers and community leaders can identify early signs of gentrification by looking at previous patterns. In Chicago, community leaders mapped rising real estate values, changing demographics, and stagnating salaries to identify at-risk neighborhoods. In California, local legislatures are working with data scientists to identify early warning signs of homelessness by analyzing data collected by dozens of agencies.
In the private sector, personal profiles can be used to target specific audiences with certain advertisements, allowing companies to maximize profitability. Using that data, companies can predict future purchases by analyzing previous patterns which allows them to avoid over and underproducing their products. They can also track the performance of their marketing campaigns and make necessary adjustments.
According to a 2015 study conducted by Ernst and Young, customers crave personalized attention, and 65% of customers are willing to share their data if it means that they will receive personalized recommendations and special offers. Personalized ads allow customers to learn about products that they may never have heard but still might be likely to purchase. Companies can keep track of purchase histories and suggest items similar to previous selections instead of advertising random predictions.
As programs go through different variations of prediction and testing sets, more repetitions increase precision in algorithms. When data scientists create models, they must break their data into parts. One part, which tends to be larger, is used as a practice set, where the model learns what responses to associate with certain inputs. After running through numerous variations of the training set, the model is then tried against the second test set. The test set feeds inputs into the model and receives an output, which is then checked to determine the accuracy of the model. With a larger amount of data to train on, the model should theoretically be able to make more accurate predictions in the future.
As the internet continues to become a more integral part of our lives, datasets will continue to grow. However, the public must be more aware of what data is being collected, and companies must be more transparent to build greater trust.