Kahibaro
Discord Login Register

What is data science?

Understanding Data Science

Data science is about using data to answer questions and make decisions. It combines three main areas:

In simple terms:

$$\text{Data Science} \approx \text{Programming} + \text{Math/Stats} + \text{Domain Knowledge}$$

You don’t need to be an expert in all three to start, but data science lives where they overlap.


What Counts as Data?

Data is any information you can store and process. Some common types:

Data science often turns these into a structured form (like tables) so they can be analyzed.


The Typical Data Science Process

Data science isn’t just “running a model.” It’s a step-by-step process. A very common loop looks like this:

  1. Define the question
  2. Collect data
  3. Clean and prepare data
  4. Explore and visualize
  5. Model and analyze
  6. Draw conclusions and communicate
  7. Act on results and iterate

1. Define the Question

Good data science starts with a clear question, like:

A clear question guides what data you need and how you’ll analyze it.

2. Collect Data

Data can come from many places:

Sometimes you collect data continuously; other times you work with snapshots.

3. Clean and Prepare Data

Real-world data is often messy:

Data cleaning involves:

This step usually takes a large portion of a data scientist’s time.

4. Explore and Visualize

Before building any fancy models, you want to understand your data:

Exploration helps you spot:

5. Model and Analyze

Models are mathematical tools that help you describe or predict things. In data science, models are used to:

At a beginner level, you might start with:

You don’t need advanced math right away; you can still do useful work with basic tools.

6. Draw Conclusions and Communicate

Data science only matters if the results lead to actions.

Examples of conclusions:

Communication often involves:

7. Act and Iterate

After sharing results, people may:

New actions generate new data, which can be analyzed again. Data science is usually an ongoing cycle.


How Data Science Differs from Related Fields

Data science overlaps with several other areas, but the focus is slightly different.

Data Science vs Data Analysis

In practice, people use these terms loosely, and roles can overlap.

Data Science vs Machine Learning

You can do data science without heavy machine learning, especially when starting out.


Where Data Science Is Used

Data science appears in many everyday things you see:

Anywhere decisions are made using data, data science is relevant.


Why Python Is Popular in Data Science

Python is one of the main languages used in data science because:

In the rest of this chapter’s sections, you’ll see how Python is used in practice for:

The Skills Data Scientists Use

Over time, a data scientist typically develops skills in:

You don’t need all of these to begin; you can build them step by step.


A Simple Example Flow

Here’s a very simplified example of what a small data science task might look like conceptually (without focusing on the code yet):

  1. Question:
    “At what time of day do we get the most website visitors?”
  2. Data:
    Website logs with:
    • Visit time
    • User ID
    • Page URL
  3. Steps:
    • Convert visit times to “hour of day” (0–23)
    • Count visits per hour
    • Plot a bar chart of visits vs hour
  4. Result:
    You might discover that:
    • Traffic peaks around 8–10 pm
    • Afternoons are quieter
  5. Action:
    Decide to:
    • Schedule important announcements in the evening
    • Plan maintenance for low-traffic hours

This is data science even without advanced models: using data to answer a real question and guide decisions.


What You’ll Explore Next

In the following sections of this chapter, you will:

This will give you a practical first taste of doing data science with Python.

Views: 16

Comments

Please login to add a comment.

Don't have an account? Register now!