Some argue that with the coming of the “Internet Age” approximately 30 years ago came the dawn of “Big Data” (BD). We can definitely say that the 90s are the childhood. The creation of the World Wide Web, registering of “google.com” and other important baby steps took place. And yes, they were important, but the 21st century marks the real beginning of the BD as we know it.
But what is “Big Data”?
Big Data is essentially: large volumes of data that are exceptionally fast and, usually, too complex to be processed using traditional methods.
Along with the popular term there is another one connected to very concept of big data – the 3Vs.
1-st V stands for Volume: Yeah, BD is huge – and growing constantly. At the beginning of 2020, the digital universe was estimated to consist of a whopping 44 zettabytes of data. To put that into perspective, one zettabyte is roughly equal to a trillion gigabytes.
2-nd V stands for Velocity: Our ever-connected world means that companies are literally inundated with data. Every single person who uses a connected device, surfs the internet, or uses social media is generating their own stream of data. So, the velocity of BD simply refers to the sheer speed at which data is generated and gathered.
3-rd V stands for Variety: It comes from myriad different sources. This is, in part, what makes it so complex; and comes in many different forms – from video, text, and image data to audio data, real-time data, and beyond – and therefore requires different types of processing and analysis.
Why so many organizations are investing in infrastructure, research and other tools for utilizing BD? What are the benefits?
Offers better insights
BD offers the potential for vastly enhanced data analytics. Used properly, organizations can employ BD to spot entirely new trends. They can segment customers to an astonishing degree of accuracy, and to allow unprecedented levels of innovation in technology and product design.
Offers a unique competitive advantage
By definition, BD is a flow of real-time information. By harnessing this flow, organizations can also adapt to changes in real-time. This means they can stay ahead of the competition in ways that companies of the past could only dream of.
Huge potential to improve productivity
Tools like Apache Hadoop and Spark allow data analysts to work with datasets they wouldn’t otherwise be able to. This not only offers improved productivity for data analysts: with enhanced tools, they can glean far greater insights and detect patterns that will boost staff productivity, too.
Internet of Things
For the most part, the internet is used for humans to communicate with each other, using machines as a go-between. However, with the Internet of Things, we are starting to see devices communicate directly with each other. This has tons of potential. For example, your thermostat could automatically adjust the temperature based on weather reports, your car could send information to the manufacturer to improve safety measures, or your fridge could simply remind you to buy milk!
This is just a taster of how BD can potentially transform the world around us. While it’s pretty exciting, with all this potential comes plenty of risk, too.
If gathered, stored, or used wrongly, BD poses some serious dangers.
Broadly speaking, the risks can be divided into four main categories: security issues, ethical issues, the deliberate abuse by malevolent players (e.g. organized crime), and unintentional misuse.
The more data an organization collects, the more expensive and difficult it is to store safely. According to the Risk-Based Security Mid-Year Data Breach report, 4.1 billion records were exposed through data breaches in the first half of 2019 alone. This highlights just how important data security is, but also the challenges organizations face in keeping our data safe. The more data a company holds, the higher the cost and practical burden of keeping it secure.
Related to this is the issue of privacy. Governments, social media giants, insurance companies, and healthcare providers are just a handful of organizations that have unprecedented levels of access to our data. While they’re bound by data protection laws (with the potential for huge fines) the increasing number of high profile data breaches in the last few years shows that more action is needed. Organizations – especially big tech – may have information on where we live, where we go, how we spend our money, and so on. With personal bank details and other sensitive information under their protection, and cyberattacks on the rise, this begs the question: just because companies can store vast amounts of data, does that mean they should?
Ethical issues with
Presuming organizations manage to keep our data safe from hackers and cyberattacks, that does not preclude the possibility that they might misuse the information themselves. Yes, data protection laws are in place. But there is still some grey area about how data can be used by companies who have obtained it legally.
Take insurance providers and credit card companies. It’s no revelation that these organizations impose premiums and limits based on customer behaviors. For instance, if you’ve ever had a car accident, you’ll know your car insurance premium goes up.
There are multiple other ethical issues too, around consent, ownership, and privacy. These have resulted in the emergence of the Right To Be Forgotten, which has led to new laws being introduced.
Abuse by malevolent players
Another danger with BD is if third parties get their hands on sensitive information. Phishing, bank fraud, and insurance scams are all common examples of how BD can be deliberately misused by organized crime groups. The days of try-their-luck emails offering you a million dollars if you just send through your bank details are long gone! If you’ve recently been the victim of a scam, you’ll know just how sophisticated they can be.
Those deliberately seeking to abuse BD are one problem. But not all dangers are necessarily premeditated. Enter machine learning. This is a crucial tool for analyzing and extracting insights from BD. However, while machine learning algorithms learn on their own, they must first be programmed how to learn, which allows human bias to sneak into the algorithm.
Human bias, as well as bad practice in data analytics, or even just poor quality data, can lead to bad insights. If these insights are used to make important financial or safety decisions (for example) there are going to be negative effects.
To avoid these kinds of risks in future, we must address systemic problems before the technology becomes more widely adopted.
Big data has vast potential – it can be used to glean ever more powerful insights and to transform the way the world works. It also comes with security issues – security and privacy issues are key concerns when it comes to BD.
Bad players can abuse it – if data falls into the wrong hands, it can be used for phishing, scams, and to spread disinformation.
There are ethical issues – as a new field, the ethics of BD is still evolving. This is why some are pushing for a Data Science Oath and for ethical guidelines to be developed.
The field of data science and big data analytics continues to evolve and grow, lets hope humanity will use it in the best possible way!
Find out more information about digital technology on our blog!