The case for learning data science

The world we live in today is filled with data

It’s almost impossible to interact with the world without generating data — when you use your phone, interact on social media, navigate through websites, or even when driving on the road!

The impact of this massive data collection is hard to quantify, but the documentary ‘The Social Dilemma’ on Netflix states it best:

“Never before have a handful of tech designers had such control over the way billions of us think, act, and live our lives.”

Learning the basics of data science is essential in today’s world. It’s important to be aware of how your data is collected, who can see it, and what others can do with it.

Data science is more than just understanding programming and statistics — it’s about understanding the systems surrounding our daily lives.

The most critical areas of data science to understand are data collection, data warehousing, modeling, and use cases for machine learning. Learning these concepts can help you discover the data you are generating, how others use your data, and how you can leverage data science for your business and/or personal life.

Data collection happens constantly, especially in today’s digital age. Knowing what information is collected and why it is collected is essential — from both a user and a business perspective.

If you’re running an online business, your storefront is your website. It is crucial to know how users interact with your site, which pages are being interacted with the most, what referral channels lead to site visits, and how many visitors leave the site without any action.

Businesses measure the interactions with their websites very closely to optimize the customer’s experience. Many companies run experiments with slightly different site layouts to further optimize the customer’s experience, a concept called A/B testing. These experiments are constantly running on a vast scale, allowing businesses to continuously tweak their site layouts to maximize their end goal — whether that be increased interaction, sales, or some other target.

There are many other areas where data is generated in the business process. These areas include production systems, HR systems, transactional databases, and more. Being aware of what data is being generated by your business (or yourself) and how to best leverage it is the first step to learning data science.

Data Warehousing is the backbone of all data analysis. Generally speaking, this is a central repository of information that is generated within a company. This information comes from many sources like transaction systems, marketing software, surveys, or external vendors.

Building an effective data warehouse can be a huge undertaking, mainly because data must be collected from disparate areas of the business and integrated into a standardized format. Work must be done with multiple stakeholders to ensure all data is extracted correctly, transformed, and loaded from source systems.

According to AWS, some of the main benefits of using a data warehouse are:

  1. Informed decision making
  2. Consolidated data from many sources
  3. Historical data analysis
  4. Data quality, consistency, and accuracy
  5. Separation of analytics processing from transactional databases, improving the performance of both systems

Typically, this step is the most time-consuming data science process. As a result, many SaaS offerings have come about lately to help alleviate this bottleneck. Nowadays, there are tools that enable business analysts to complete the integration work that could previously only be handled by a skilled data engineer. This helps open the door for many smaller companies or individuals to work with their data and gather insights.

Modeling is one of the most important pieces of data science. At a high level, a model is a mathematical representation used to explain reality. Models are used to explain and predict company sales, forest fires, weather events, and even for recommending movies on Netflix.

Artificial intelligence and machine learning are built upon these models, using available data to predict a future outcome. The model used can vary widely based on each specific use case, so I’ll avoid going into specifics. To put it briefly, some models are easier to explain, and some can not be explained easily but have a high prediction power. Some situations require explainability (medical diagnoses, regulatory compliance, etc.), and others are more concerned with accuracy.

The main takeaway here is that models are complex mathematic formulas used to represent reality and predict future outcomes. Being able to leverage their predictive power for your business or your personal pursuits is key to learning data science.

Oxford defines machine learning as:

“the use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw inferences from patterns in data”.

In simple terms — machine learning uses a model (or models) to predict future outcomes.

One of the most common applications of machine learning is Natural Language Processing (NLP). If you’re aware of the GPT-3 model, then you know just how powerful machine learning can be. This model can produce incredibly realistic articles; at times, its writing is indistinguishable from a human’s writing.

Another common application of machine learning is for recommendations and advertisements. When Facebook or LinkedIn recommend ‘People you may know’, they leverage machine learning to find associations between your profile and similar profiles. When Netflix recommends what movie or TV show to watch next, it predicts what you’ll watch based on what you’ve watched in the past.

Personalized marketing is one of the most powerful applications of machine learning for a business. Customers can be segmented based on shopping patterns, site interactions, or by some other means. The machine learning model identifies which segment a customer falls in and allows a business to send targeted, personal ads to customers. Being knowledgeable of the potential use cases for machine learning — and where it can be used in your digital strategy is the final key to learning data science.

In conclusion — anyone can learn the basics of data science. These powerful tools can be used to build a business, improve one’s personal life, or simply to better understand the world around us. Knowing what is currently being done with data and what data can do for you is key to success in our rapidly evolving world.

Thanks for reading!

Below are some of the educational resources I’ve found most helpful if you’d like to dig deeper into data science:

  1. Data Science from Scratch — First Principles with Python by Joel Grus
  2. If you’re willing to subscribe to Medium, they have a bunch of excellent technical articles, and I read a lot here:
  3. These guys are free, and they have some interesting articles:
  4. Youtube can also be a great source for free learning materials
  5. Datacamp, Coursera, and EdX are all good paid options — I’m personally taking a few Micromasters courses from MIT on EdX

If you want to learn more about data management, advanced analytics, or just have a general interest in data — give us a follow on LinkedIn and Facebook.

Interested in working with us?

Reach out to us and we will make sure to help you achieve your business goals!