23 September 2019
Blog: From SQL to Big Data Analytics. What has changed?
The mind-boggling amount of data that the world generates (around 2.5 quintillion bytes per day ), offers unlimited potential on one hand, and a seemingly unsurmountable challenge on the other, for data science companies looking to make long-term business decisions. Adding to the challenge is the fact that data generated is of varied types, each requiring specialised approaches for analysis.
What does big data look like?
Data can be broadly classified into traditional-structured, semi-structured or unstructured formats. Traditional-structured data is information that is stored in relational databases governed by specific rules/conventions. This type of data can analysed by using domain-specific structured query language (SQL), wherein the approach is straight-forward and well-established. Things start getting complicated with semi-structured data wherein information is not contained within relational databases but is generally organized in a manner that allows for some analysis e.g semantic tags . Hence, conventional query-based analysis techniques are likely to yield only limited amount of useful information. Finally, unstructured data is information which is not organised in a pre-defined manner or does not have a pre-defined data model e.g. emails, videos, etc.
SQL becomes ineffective
Increasingly, the amount of data stored in the form of emails, word documents, pdf, audio and video files is unstructured. It is estimated that unstructured data will comprise more than 80% of the data generated in the coming years . This means that the conventional SQL-based analysis is ineffective. It thus becomes necessary to develop new approaches and technologies in order to extract knowledge or retrieve important information from within this universe of unstructured data.
The solution for big data
Research in data science and big data analytics has made extracting knowledge from data, in these various forms, possible and in many cases revolutionized the way the world looks at unstructured data. Data science uses a combination of several fields including mathematics, statistics, information science, machine learning, and data mining to analyse data and extract specific information. Big data analytics offers numerous new techniques and algorithms that can help in analysing and understanding the hidden knowledge from unstructured data formats. This has been termed the big data revolution or the fourth industrial revolution . Research into artificial intelligence (AI) has yielded an increasing number of tools and techniques that can enable individual users and companies to perform targeted analysis on different types of unstructured data . These AI tools combine the ever-increasing computing power of personal and professional IT devices with improving connectivity to cloud services, to provide a growing range of services that seamlessly handle day-to-day personal challenges or help businesses make informed high-risk decisions.