11  Introduction to Python and AI

So far, everything in this book has been written in R. R is an excellent tool for statistical analysis, data visualization, and working with structured datasets — and it will remain useful throughout your career. But Python has become the dominant language for working with APIs, building web applications, and integrating AI into your projects. Rather than choosing between them, most working data analysts and developers know both.

This chapter introduces Python from the ground up, assuming you know R but have never touched Python. We’ll cover the language basics, package management, working with data, connecting to APIs, and finally building a simple AI-powered web application.

11.1 Python vs. R: What’s Different?

Python and R share a lot of philosophy — both are interpreted, package-based languages that encourage readable code. The differences are mostly practical:

  • R was built by statisticians, for statistics. Its visualization tools (ggplot2) and modeling capabilities are unmatched.
  • Python was built as a general-purpose language. It’s better for building applications, automating workflows, web scraping, and connecting to AI services.

You’ll find the two languages feel similar once you get past the syntax.

11.2 Setting Up Python and PyCharm

11.2.1 Step 1: Install Python

Download the latest version of Python from the official website:

https://www.python.org/downloads/

Click the big yellow Download Python button — it will automatically offer the right version for your operating system. Run the installer.

Important

Mac users: On the last screen of the installer, double-click the Install Certificates script. Without this, some packages won’t be able to connect to the internet.

Windows users: On the very first screen of the installer, check the box that says “Add Python to PATH” before clicking Install. If you miss this, uninstall and re-install.

11.2.2 Step 2: Install PyCharm

PyCharm is a professional Python editor made by JetBrains. The Community edition is completely free:

https://www.jetbrains.com/pycharm/download/

Scroll down to PyCharm Community Edition and download it. Install it as you would any application.

11.2.3 Step 3: Create a New Project

Open PyCharm. On the welcome screen, click New Project.

  • Set the Location to a folder on your computer — name it something like python_analytics.
  • Leave all other settings at their defaults. PyCharm will automatically create a virtual environment for this project, which keeps your installed packages organized and separate from other projects.
  • Click Create.

11.2.4 Using the Terminal in PyCharm

At the bottom of the PyCharm window, you’ll see a row of tabs. Click Terminal. This opens a command-line shell directly inside your project.

Note

Always use PyCharm’s built-in Terminal for installing packages. It automatically uses the correct version of Python and pip for your project, regardless of whether you’re on a Mac or PC. This avoids a lot of confusion.

To create a new Python file: in the Project panel on the left, right-click your project folder → New → Python File → give it a name.

11.3 Python Basics

11.3.1 Variables and Types

In R, we assign variables with <-. In Python, it’s just =:

name = "Brian"
year = 1975
proportion = 0.042
is_popular = True

Python has the same basic types: strings, integers, floats, and booleans. Unlike R, Python is zero-indexed — the first item in a list is at position 0, not 1.

11.3.2 Printing Output

In R, typing a variable name shows its value. In Python, use print():

print(name)
print(year)

11.3.3 Lists

The Python equivalent of R’s c() is a list:

beatles = ["John", "Paul", "George", "Ringo"]

print(beatles[0])   # "John"  — zero-indexed
print(beatles[-1])  # "Ringo" — negative index counts from the end

11.3.4 Dictionaries

Dictionaries store key-value pairs, similar to a named list in R:

song = {
    "title": "Let It Be",
    "artist": "The Beatles",
    "year": 1970
}

print(song["title"])   # "Let It Be"
print(song["year"])    # 1970

Dictionaries are especially important later: API responses come back as JSON, which Python reads directly as a dictionary.

11.3.5 Loops

In R, you often use vectorized operations instead of loops. In Python, loops are common and idiomatic:

for name in beatles:
    print(name)

Note the indentation — Python uses whitespace to define code blocks, instead of curly braces {}. Use four spaces (or one Tab) consistently. Getting indentation wrong is the single most common Python error for beginners.

11.3.6 Functions

def greet(name):
    return "Hello, " + name

print(greet("Ringo"))

11.3.7 f-Strings

The cleanest way to embed variables inside strings:

artist = "James Taylor"
year = 1971
print(f"{artist} released Mud Slide Slim in {year}.")

The f before the quote tells Python to evaluate anything inside {} as a variable or expression. You’ll use f-strings constantly.

11.4 Installing Packages

Like R, Python is package-based. The package manager is called pip. You run it from the terminal, not from inside Python itself.

In PyCharm’s Terminal, use this command on both Mac and PC:

python -m pip install pandas

The python -m pip syntax ensures you’re always using the pip that belongs to the Python interpreter your project is using. It works identically on both platforms.

Note

Why not just pip install?

On Mac, if you ever open a terminal outside of PyCharm, pip may point to an older Python 2 installation. python -m pip always refers to the version of Python you opened. On Windows, pip install usually works fine outside of PyCharm too, but python -m pip is the safer habit.

When using PyCharm’s built-in Terminal, pip install works correctly on both platforms because PyCharm configures the environment for you. In this book, we’ll use python -m pip install throughout for consistency.

Once installed, you import packages at the top of your Python file:

import pandas as pd
import matplotlib.pyplot as plt
import requests

The as pd and as plt parts create aliases — a universal convention for these two packages. Install once; import in every file that needs it.

11.5 Working with Data

The primary data package in Python is pandas, which introduces the DataFrame — the same concept as R’s data frame. If you know dplyr, you already understand pandas conceptually.

11.5.1 Downloading the Data

For this chapter, we’ll use the US baby names dataset — the same underlying data as the babynames R package we’ve used throughout this book.

Download the file NationalNames.csv from Kaggle:

https://www.kaggle.com/datasets/kaggle/us-baby-names

You’ll need a free Kaggle account. Once downloaded, place NationalNames.csv directly inside your PyCharm project folder (the same folder as your Python files).

Tip

How to find your project folder: In PyCharm’s Project panel on the left, right-click your project name and choose Open In → Finder (Mac) or Open In → Explorer (Windows). Drag your downloaded CSV into that folder.

11.5.2 Loading Data

import pandas as pd

babynames = pd.read_csv('NationalNames.csv')

Notice we didn’t need to specify a full path — because our CSV is in the project folder, Python finds it automatically. This is one of the key benefits of using PyCharm’s project system.

You can also load data directly from a URL, without downloading anything first:

covid = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")

This is especially useful when working with data that updates regularly or lives on GitHub.

11.5.3 Exploring Your Data

babynames.head()       # first 5 rows
babynames.tail(10)     # last 10 rows
babynames.info()       # column names, types, non-null counts
babynames.describe()   # summary statistics
babynames.shape        # (rows, columns) as a tuple
babynames.columns      # list of column names

The NationalNames.csv file has five columns: Id, Name, Year, Gender, and Count. This is essentially the same data as the R babynames package, with Count in place of n.

11.5.4 Filtering

Where R uses filter(), pandas uses boolean indexing:

# Only rows where Name is "Ringo"
ringo = babynames[babynames['Name'] == 'Ringo']

# Multiple conditions — wrap each condition in parentheses, use & for AND, | for OR
ringo_male = babynames[(babynames['Name'] == 'Ringo') & (babynames['Gender'] == 'M')]

11.5.5 Sorting

babynames.sort_values('Count', ascending=False).head(10)

11.5.6 Grouping and Summarizing

The pandas equivalent of group_by() |> summarize():

name_totals = (
    babynames
    .groupby('Name')['Count']
    .sum()
    .reset_index()
    .sort_values('Count', ascending=False)
)

name_totals.head(10)

11.5.7 Creating New Columns

The pandas equivalent of mutate():

babynames['name_length'] = babynames['Name'].str.len()

11.5.8 Visualizing with Matplotlib

import matplotlib.pyplot as plt

# Filter to Brian, male only
brian = babynames[(babynames['Name'] == 'Brian') & (babynames['Gender'] == 'M')]

plt.plot(brian['Year'], brian['Count'])
plt.xlabel('Year')
plt.ylabel('Count')
plt.title('Popularity of the Name Brian Over Time')
plt.show()

For a bar chart of the most popular names in 1975:

top_1975 = (
    babynames[babynames['Year'] == 1975]
    .sort_values('Count', ascending=False)
    .head(10)
)

plt.figure(figsize=(10, 6))
plt.barh(top_1975['Name'], top_1975['Count'])
plt.xlabel('Count')
plt.title('Top 10 Names in 1975')
plt.gca().invert_yaxis()   # put the most popular name at the top
plt.tight_layout()
plt.show()

11.6 Working with APIs

An API (Application Programming Interface) lets you query an external database and get back only the data you need. Python’s requests library handles this cleanly. Most APIs return data in JSON format, which Python reads as a dictionary.

11.6.1 Installing Requests

In PyCharm’s Terminal:

python -m pip install requests

11.6.2 A Basic API Request

Let’s use the Open Meteo weather API, which requires no account or key:

import requests

url = "https://api.open-meteo.com/v1/forecast"
params = {
    "latitude": 40.7128,
    "longitude": -74.0060,
    "current_weather": True
}

response = requests.get(url, params=params)
data = response.json()

print(data['current_weather'])
  • requests.get() sends an HTTP GET request to the URL
  • .json() parses the response as a Python dictionary
  • We navigate the result with bracket notation, the same as a dictionary

11.6.3 Checking for Errors

APIs can fail — bad keys, rate limits, server issues. Always check the status code:

if response.status_code == 200:
    data = response.json()
    print("Success!")
else:
    print(f"Error {response.status_code}: {response.text}")

200 = success. 401 = bad API key. 429 = rate limit exceeded.

11.6.4 The Genius API: Getting Song Lyrics

The Genius API lets you search song lyrics and artist data — a great source for media analytics. To get your API key:

  1. Create a free account at genius.com
  2. Go to https://genius.com/api-clients
  3. Click New API Client, fill in a name and website (anything works), and click Save
  4. Click Generate Access Token — copy it and keep it safe

First, install the lyricsgenius wrapper package, which makes the Genius API much easier to use than calling it directly:

python -m pip install lyricsgenius
#| eval: false

from lyricsgenius import Genius

genius = Genius("YOUR_CLIENT_ACCESS_TOKEN_HERE")

# Download an entire album's lyrics
album = genius.search_album("Blood on the Tracks", "Bob Dylan")
album.save_lyrics()   # saves to a JSON file in your project folder
#| eval: false

# Or search for a single song
song = genius.search_song("Fire and Rain", "James Taylor")
print(song.lyrics[:500])   # first 500 characters of the lyrics

You can then load the saved JSON file with pandas and analyze the lyrics using the same text analysis techniques from the Text Analysis chapter — but now in Python.

11.7 Connecting to an AI API

Large language models from OpenAI can be called from Python in the same way we call any other API. The difference is that your ‘request’ is a conversation, and the ‘response’ is AI-generated text.

11.7.1 Getting an OpenAI API Key

Go to https://platform.openai.com and create an account. In your account menu, navigate to API Keys and click Create new secret key. Copy it immediately — you will not be able to see it again after closing the window.

11.7.2 Install the Required Packages

In PyCharm’s Terminal:

python -m pip install openai python-dotenv

11.7.3 Storing Your API Key Safely: The .env File

Your API key is essentially a password — it grants access to a service you pay for. Never paste it directly into your Python code and never commit it to GitHub. The standard way to handle this is a .env file.

A .env file is a plain text file that lives in your project folder and stores sensitive values like API keys. Python can read it with a package called dotenv. The file never gets shared or published because we add it to .gitignore.

11.7.3.1 Creating a .env File in PyCharm (Mac and PC)

This is the easiest method and works identically on both platforms:

  1. In PyCharm’s Project panel on the left, right-click your project folder (the top-level folder)
  2. Choose New → File
  3. Type .env as the filename (the dot at the beginning is intentional and important)
  4. Click OK

PyCharm will open the file. Type your API key like this — no quotes, no spaces:

OPENAI_API_KEY=sk-your-actual-key-goes-here

Save the file. That’s it.

Note

Why the dot in front? Files starting with a dot are hidden files on Mac and Linux — they don’t show up in Finder or regular directory listings. This is a convention for configuration files. PyCharm shows them in its Project panel regardless.

11.7.3.2 Creating a .env File Outside of PyCharm

If you prefer to use a plain text editor:

On a Mac:

  1. Open TextEdit
  2. Go to Format → Make Plain Text (important — if you skip this, it will save as a .rtf file, which won’t work)
  3. Type your key: OPENAI_API_KEY=sk-your-key-here
  4. Go to File → Save As, name it .env, and navigate to your project folder
  5. When prompted, click Use .env to confirm the unusual filename

Alternatively, use the Terminal app:

cd ~/your-project-folder
nano .env

Type your key, then press Control+O to save and Control+X to exit.

On Windows:

  1. Open Notepad
  2. Type your key: OPENAI_API_KEY=sk-your-key-here
  3. Go to File → Save As
  4. In the Save as type dropdown, choose All Files (*.*)
  5. Set the filename to .env and navigate to your project folder
  6. Click Save

11.7.3.3 Reading the .env File in Python

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

load_dotenv() reads your .env file and makes its values available as environment variables. os.getenv() retrieves them by name. After this, the OpenAI client will find the key automatically.

11.7.3.4 Keeping the Key Out of Git

If you use git and GitHub, add .env to your .gitignore file so it never gets pushed to a public repository. In PyCharm’s Terminal:

echo ".env" >> .gitignore

Or open .gitignore in PyCharm and add .env on its own line.

11.7.4 Your First AI Call

#| eval: false

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()   # automatically reads OPENAI_API_KEY from your .env

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What are the top five most streamed artists on Spotify?"}
    ]
)

print(response.choices[0].message.content)

The messages parameter is a list of dictionaries. Each message has a role:

  • "user" — your message to the AI
  • "assistant" — the AI’s previous replies
  • "system" — instructions about how the AI should behave (not shown to the user)

11.7.5 The System Prompt

The "system" role is how you customize the AI’s persona and behavior before the conversation begins:

#| eval: false

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are a media analytics assistant. Answer questions in the context of journalism, communications, and digital media."
        },
        {
            "role": "user",
            "content": "What does engagement rate mean?"
        }
    ]
)

print(response.choices[0].message.content)

11.7.6 Maintaining a Conversation

The API is stateless — it has no memory between separate calls. To maintain a multi-turn conversation, you send the full history of messages with every request:

#| eval: false

chat_history = [
    {"role": "system", "content": "You are a helpful media analytics assistant."}
]

# Turn 1
chat_history.append({"role": "user", "content": "What is sentiment analysis?"})
response = client.chat.completions.create(model="gpt-4o", messages=chat_history)
reply = response.choices[0].message.content
chat_history.append({"role": "assistant", "content": reply})
print(reply)

# Turn 2 — the model "remembers" the previous exchange because we sent the full history
chat_history.append({"role": "user", "content": "Can you give me a Python example of that?"})
response = client.chat.completions.create(model="gpt-4o", messages=chat_history)
reply = response.choices[0].message.content
chat_history.append({"role": "assistant", "content": reply})
print(reply)

This pattern — appending each exchange and re-sending the full list — is the core mechanic behind every chatbot, regardless of how they’re presented to users.

11.8 Building an AI App with chat_with_memory

We’ve seen how to call an AI API from a script. Now let’s turn that into a web application that runs in a browser and lets anyone have a conversation with an AI.

The application we’ll build uses:

  • Quart — a lightweight Python web framework (similar to Flask, but supports the async API calls that OpenAI requires)
  • Session storage — stores the conversation history in the user’s browser between requests, giving the AI “memory”
  • Jinja2 templates — HTML files with placeholders that Quart fills in with real data before sending to the browser

11.8.1 Install Required Packages

In PyCharm’s Terminal:

python -m pip install quart openai python-dotenv

11.8.2 Project Structure

Your project folder should look exactly like this:

your-project-folder/
├── chat_with_memory.py
├── .env
└── templates/
    └── index.html

The templates/ folder is not optional — Quart (and Flask) require that HTML templates live in a subfolder with exactly that name.

11.8.2.1 Creating the templates folder in PyCharm

  1. In the Project panel, right-click your project folder
  2. Choose New → Directory
  3. Name it templates (all lowercase) and press Enter

11.8.2.2 Creating index.html in PyCharm

  1. Right-click the templates folder
  2. Choose New → HTML File
  3. Name it index and press Enter

PyCharm will create templates/index.html with some starter HTML. Replace all of it with the following:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>AI Chat</title>
    <style>
        body {
            font-family: Verdana, sans-serif;
            max-width: 700px;
            margin: 40px auto;
            padding: 0 20px;
        }
        textarea {
            width: 100%;
            font-size: 1em;
            padding: 8px;
        }
        button {
            margin-top: 10px;
            padding: 10px 20px;
            font-size: 1em;
        }
        #response-box {
            margin-top: 20px;
            padding: 16px;
            background: #f4f4f4;
            border-radius: 6px;
            white-space: pre-wrap;
        }
        .spinner {
            display: none;
            margin: 10px 0;
            font-style: italic;
            color: #888;
        }
    </style>
</head>
<body>
    <h1>Ask Me Anything</h1>

    <form action="/chat" method="post" id="chat-form">
        <textarea name="user_input" rows="4" placeholder="Ask something..." required></textarea>
        <br>
        <button type="submit" id="submit-btn">Send</button>
    </form>

    <p class="spinner" id="loading">Thinking...</p>

    <div id="response-box">
        <strong>Response:</strong><br><br>
        {{ assistant_reply | safe }}
    </div>

    <script>
        document.getElementById('chat-form').addEventListener('submit', function() {
            document.getElementById('loading').style.display = 'block';
            document.getElementById('submit-btn').disabled = true;
        });
    </script>
</body>
</html>

The { assistant_reply | safe } placeholder is Jinja2 syntax — Quart replaces it with the actual AI response before sending the page to the browser. The | safe filter allows any HTML in the response to render properly.

11.8.3 The Application: chat_with_memory.py

Create a new Python file in your project folder called chat_with_memory.py and add the following:

#| eval: false

from quart import Quart, render_template, request, session
from dotenv import load_dotenv
from openai import AsyncOpenAI

load_dotenv()

client = AsyncOpenAI()
app = Quart(__name__)
app.secret_key = "change-this-to-something-unique"

@app.route('/', methods=['GET'])
async def index():
    return await render_template('index.html', assistant_reply="")

@app.route('/chat', methods=['POST'])
async def chat():
    try:
        form_data = await request.form
        query = form_data['user_input']

        # Get the conversation history from the session, or start a fresh one
        chat_history = session.get('chat_history', [])

        # Remove old system messages so we always have a fresh one at the top
        chat_history = [m for m in chat_history if m["role"] != "system"]
        chat_history.insert(0, {"role": "system", "content": "You are a helpful assistant."})

        # Add the user's new message
        chat_history.append({"role": "user", "content": query})

        # Call the OpenAI API with the full conversation history
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=chat_history
        )

        assistant_response = response.choices[0].message.content
        chat_history.append({"role": "assistant", "content": assistant_response})

        # Save the updated history back to the session
        session['chat_history'] = chat_history

        return await render_template('index.html', assistant_reply=assistant_response)

    except Exception as e:
        app.logger.error(f"Error: {e}")
        return await render_template("index.html", assistant_reply="Something went wrong. Please try again.")

if __name__ == "__main__":
    app.run(debug=True, port=8080)

11.8.4 Walking Through the Code

load_dotenv() reads your .env file so the API key is available without being written into the code.

AsyncOpenAI() creates an OpenAI client that uses asynchronous calls. The async/await keywords throughout the file tell Python: “while we’re waiting for OpenAI to respond, the web server can handle other tasks.” This is what Quart requires over regular Flask.

app.secret_key is used by Quart to encrypt the session data stored in the user’s browser cookie. Change this string to something unique in any real deployment.

Routes are how a web application responds to different URLs. The @app.route('/') decorator connects the homepage to the index() function, which just renders the empty chat form. The @app.route('/chat') route handles form submissions.

session is a dictionary that Quart stores as an encrypted cookie in the user’s browser. It persists between page loads, which is exactly how the app ‘remembers’ the conversation. Each time the user submits a message, we retrieve the history, add the new exchange, and save it back.

11.8.5 Running the App

Make sure your .env file is in the project folder, then open PyCharm’s Terminal and run:

python chat_with_memory.py

You’ll see output like:

Running on http://127.0.0.1:8080

Open http://127.0.0.1:8080 in your browser. Type a question, click Send, and the AI will respond. Ask a follow-up — it will remember what was said before.

To stop the app, click in the Terminal and press Control+C.

Note

Why port 8080? Port 5000 is used by Apple’s AirPlay Receiver service on modern Macs, which causes a conflict. Port 8080 is a standard alternative that works on both Mac and Windows without any conflicts.

11.8.6 Customizing the App

Change the AI’s persona by editing the system prompt inside chat_with_memory.py:

system_prompt = """You are a media analytics tutor specializing in
digital journalism and audience measurement. Give clear, practical
examples using real platforms like Instagram, YouTube, and Spotify."""

Reset the conversation: Because history is stored in the browser session, closing and reopening the browser tab will start a fresh conversation. You can also add a “Clear” button that calls a route to delete session['chat_history'].

11.8.7 Next Steps

From here, natural extensions include:

  • Streaming responses — display the AI’s text word-by-word as it’s generated instead of waiting for the full response. The OpenAI Python SDK supports this natively with stream=True.
  • A persistent database — the session cookie has a size limit. A database (SQLite, PostgreSQL) lets users return to previous conversations.
  • Deploying online — platforms like Render, Railway, or Heroku can host a Quart app for free or low cost, making your app accessible to anyone, not just your own computer.