11 Introduction to Python and AI
So far, everything in this book has been written in R. R is an excellent tool for statistical analysis, data visualization, and working with structured datasets — and it will remain useful throughout your career. But Python has become the dominant language for working with APIs, building web applications, and integrating AI into your projects. Rather than choosing between them, most working data analysts and developers know both.
This chapter introduces Python from the ground up, assuming you know R but have never touched Python. We’ll cover the language basics, package management, working with data, connecting to APIs, and finally building a simple AI-powered web application.
11.1 Python vs. R: What’s Different?
Python and R share a lot of philosophy — both are interpreted, package-based languages that encourage readable code. The differences are mostly practical:
- R was built by statisticians, for statistics. Its visualization tools (ggplot2) and modeling capabilities are unmatched.
- Python was built as a general-purpose language. It’s better for building applications, automating workflows, web scraping, and connecting to AI services.
You’ll find the two languages feel similar once you get past the syntax.
11.2 Setting Up Python and PyCharm
11.2.1 Step 1: Install Python
Download the latest version of Python from the official website:
Click the big yellow Download Python button — it will automatically offer the right version for your operating system. Run the installer.
Mac users: On the last screen of the installer, double-click the Install Certificates script. Without this, some packages won’t be able to connect to the internet.
Windows users: On the very first screen of the installer, check the box that says “Add Python to PATH” before clicking Install. If you miss this, uninstall and re-install.
11.2.2 Step 2: Install PyCharm
PyCharm is a professional Python editor made by JetBrains. The Community edition is completely free:
Scroll down to PyCharm Community Edition and download it. Install it as you would any application.
11.2.3 Step 3: Create a New Project
Open PyCharm. On the welcome screen, click New Project.
- Set the Location to a folder on your computer — name it something like
python_analytics. - Leave all other settings at their defaults. PyCharm will automatically create a virtual environment for this project, which keeps your installed packages organized and separate from other projects.
- Click Create.
11.2.4 Using the Terminal in PyCharm
At the bottom of the PyCharm window, you’ll see a row of tabs. Click Terminal. This opens a command-line shell directly inside your project.
Always use PyCharm’s built-in Terminal for installing packages. It automatically uses the correct version of Python and pip for your project, regardless of whether you’re on a Mac or PC. This avoids a lot of confusion.
To create a new Python file: in the Project panel on the left, right-click your project folder → New → Python File → give it a name.
11.3 Python Basics
11.3.1 Variables and Types
In R, we assign variables with <-. In Python, it’s just =:
name = "Brian"
year = 1975
proportion = 0.042
is_popular = TruePython has the same basic types: strings, integers, floats, and booleans. Unlike R, Python is zero-indexed — the first item in a list is at position 0, not 1.
11.3.2 Printing Output
In R, typing a variable name shows its value. In Python, use print():
print(name)
print(year)11.3.3 Lists
The Python equivalent of R’s c() is a list:
beatles = ["John", "Paul", "George", "Ringo"]
print(beatles[0]) # "John" — zero-indexed
print(beatles[-1]) # "Ringo" — negative index counts from the end11.3.4 Dictionaries
Dictionaries store key-value pairs, similar to a named list in R:
song = {
"title": "Let It Be",
"artist": "The Beatles",
"year": 1970
}
print(song["title"]) # "Let It Be"
print(song["year"]) # 1970Dictionaries are especially important later: API responses come back as JSON, which Python reads directly as a dictionary.
11.3.5 Loops
In R, you often use vectorized operations instead of loops. In Python, loops are common and idiomatic:
for name in beatles:
print(name)Note the indentation — Python uses whitespace to define code blocks, instead of curly braces {}. Use four spaces (or one Tab) consistently. Getting indentation wrong is the single most common Python error for beginners.
11.3.6 Functions
def greet(name):
return "Hello, " + name
print(greet("Ringo"))11.3.7 f-Strings
The cleanest way to embed variables inside strings:
artist = "James Taylor"
year = 1971
print(f"{artist} released Mud Slide Slim in {year}.")The f before the quote tells Python to evaluate anything inside {} as a variable or expression. You’ll use f-strings constantly.
11.4 Installing Packages
Like R, Python is package-based. The package manager is called pip. You run it from the terminal, not from inside Python itself.
In PyCharm’s Terminal, use this command on both Mac and PC:
python -m pip install pandasThe python -m pip syntax ensures you’re always using the pip that belongs to the Python interpreter your project is using. It works identically on both platforms.
Why not just pip install?
On Mac, if you ever open a terminal outside of PyCharm, pip may point to an older Python 2 installation. python -m pip always refers to the version of Python you opened. On Windows, pip install usually works fine outside of PyCharm too, but python -m pip is the safer habit.
When using PyCharm’s built-in Terminal, pip install works correctly on both platforms because PyCharm configures the environment for you. In this book, we’ll use python -m pip install throughout for consistency.
Once installed, you import packages at the top of your Python file:
import pandas as pd
import matplotlib.pyplot as plt
import requestsThe as pd and as plt parts create aliases — a universal convention for these two packages. Install once; import in every file that needs it.
11.5 Working with Data
The primary data package in Python is pandas, which introduces the DataFrame — the same concept as R’s data frame. If you know dplyr, you already understand pandas conceptually.
11.5.1 Downloading the Data
For this chapter, we’ll use the US baby names dataset — the same underlying data as the babynames R package we’ve used throughout this book.
Download the file NationalNames.csv from Kaggle:
You’ll need a free Kaggle account. Once downloaded, place NationalNames.csv directly inside your PyCharm project folder (the same folder as your Python files).
How to find your project folder: In PyCharm’s Project panel on the left, right-click your project name and choose Open In → Finder (Mac) or Open In → Explorer (Windows). Drag your downloaded CSV into that folder.
11.5.2 Loading Data
import pandas as pd
babynames = pd.read_csv('NationalNames.csv')Notice we didn’t need to specify a full path — because our CSV is in the project folder, Python finds it automatically. This is one of the key benefits of using PyCharm’s project system.
You can also load data directly from a URL, without downloading anything first:
covid = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")This is especially useful when working with data that updates regularly or lives on GitHub.
11.5.3 Exploring Your Data
babynames.head() # first 5 rows
babynames.tail(10) # last 10 rows
babynames.info() # column names, types, non-null counts
babynames.describe() # summary statistics
babynames.shape # (rows, columns) as a tuple
babynames.columns # list of column namesThe NationalNames.csv file has five columns: Id, Name, Year, Gender, and Count. This is essentially the same data as the R babynames package, with Count in place of n.
11.5.4 Filtering
Where R uses filter(), pandas uses boolean indexing:
# Only rows where Name is "Ringo"
ringo = babynames[babynames['Name'] == 'Ringo']
# Multiple conditions — wrap each condition in parentheses, use & for AND, | for OR
ringo_male = babynames[(babynames['Name'] == 'Ringo') & (babynames['Gender'] == 'M')]11.5.5 Sorting
babynames.sort_values('Count', ascending=False).head(10)11.5.6 Grouping and Summarizing
The pandas equivalent of group_by() |> summarize():
name_totals = (
babynames
.groupby('Name')['Count']
.sum()
.reset_index()
.sort_values('Count', ascending=False)
)
name_totals.head(10)11.5.7 Creating New Columns
The pandas equivalent of mutate():
babynames['name_length'] = babynames['Name'].str.len()11.5.8 Visualizing with Matplotlib
import matplotlib.pyplot as plt
# Filter to Brian, male only
brian = babynames[(babynames['Name'] == 'Brian') & (babynames['Gender'] == 'M')]
plt.plot(brian['Year'], brian['Count'])
plt.xlabel('Year')
plt.ylabel('Count')
plt.title('Popularity of the Name Brian Over Time')
plt.show()For a bar chart of the most popular names in 1975:
top_1975 = (
babynames[babynames['Year'] == 1975]
.sort_values('Count', ascending=False)
.head(10)
)
plt.figure(figsize=(10, 6))
plt.barh(top_1975['Name'], top_1975['Count'])
plt.xlabel('Count')
plt.title('Top 10 Names in 1975')
plt.gca().invert_yaxis() # put the most popular name at the top
plt.tight_layout()
plt.show()11.6 Working with APIs
An API (Application Programming Interface) lets you query an external database and get back only the data you need. Python’s requests library handles this cleanly. Most APIs return data in JSON format, which Python reads as a dictionary.
11.6.1 Installing Requests
In PyCharm’s Terminal:
python -m pip install requests11.6.2 A Basic API Request
Let’s use the Open Meteo weather API, which requires no account or key:
import requests
url = "https://api.open-meteo.com/v1/forecast"
params = {
"latitude": 40.7128,
"longitude": -74.0060,
"current_weather": True
}
response = requests.get(url, params=params)
data = response.json()
print(data['current_weather'])requests.get()sends an HTTP GET request to the URL.json()parses the response as a Python dictionary- We navigate the result with bracket notation, the same as a dictionary
11.6.3 Checking for Errors
APIs can fail — bad keys, rate limits, server issues. Always check the status code:
if response.status_code == 200:
data = response.json()
print("Success!")
else:
print(f"Error {response.status_code}: {response.text}")200 = success. 401 = bad API key. 429 = rate limit exceeded.
11.6.4 The Genius API: Getting Song Lyrics
The Genius API lets you search song lyrics and artist data — a great source for media analytics. To get your API key:
- Create a free account at genius.com
- Go to https://genius.com/api-clients
- Click New API Client, fill in a name and website (anything works), and click Save
- Click Generate Access Token — copy it and keep it safe
First, install the lyricsgenius wrapper package, which makes the Genius API much easier to use than calling it directly:
python -m pip install lyricsgenius#| eval: false
from lyricsgenius import Genius
genius = Genius("YOUR_CLIENT_ACCESS_TOKEN_HERE")
# Download an entire album's lyrics
album = genius.search_album("Blood on the Tracks", "Bob Dylan")
album.save_lyrics() # saves to a JSON file in your project folder#| eval: false
# Or search for a single song
song = genius.search_song("Fire and Rain", "James Taylor")
print(song.lyrics[:500]) # first 500 characters of the lyricsYou can then load the saved JSON file with pandas and analyze the lyrics using the same text analysis techniques from the Text Analysis chapter — but now in Python.
11.7 Connecting to an AI API
Large language models from OpenAI can be called from Python in the same way we call any other API. The difference is that your ‘request’ is a conversation, and the ‘response’ is AI-generated text.
11.7.1 Getting an OpenAI API Key
Go to https://platform.openai.com and create an account. In your account menu, navigate to API Keys and click Create new secret key. Copy it immediately — you will not be able to see it again after closing the window.
11.7.2 Install the Required Packages
In PyCharm’s Terminal:
python -m pip install openai python-dotenv11.7.3 Storing Your API Key Safely: The .env File
Your API key is essentially a password — it grants access to a service you pay for. Never paste it directly into your Python code and never commit it to GitHub. The standard way to handle this is a .env file.
A .env file is a plain text file that lives in your project folder and stores sensitive values like API keys. Python can read it with a package called dotenv. The file never gets shared or published because we add it to .gitignore.
11.7.3.1 Creating a .env File in PyCharm (Mac and PC)
This is the easiest method and works identically on both platforms:
- In PyCharm’s Project panel on the left, right-click your project folder (the top-level folder)
- Choose New → File
- Type
.envas the filename (the dot at the beginning is intentional and important) - Click OK
PyCharm will open the file. Type your API key like this — no quotes, no spaces:
OPENAI_API_KEY=sk-your-actual-key-goes-here
Save the file. That’s it.
Why the dot in front? Files starting with a dot are hidden files on Mac and Linux — they don’t show up in Finder or regular directory listings. This is a convention for configuration files. PyCharm shows them in its Project panel regardless.
11.7.3.2 Creating a .env File Outside of PyCharm
If you prefer to use a plain text editor:
On a Mac:
- Open TextEdit
- Go to Format → Make Plain Text (important — if you skip this, it will save as a
.rtffile, which won’t work) - Type your key:
OPENAI_API_KEY=sk-your-key-here - Go to File → Save As, name it
.env, and navigate to your project folder - When prompted, click Use .env to confirm the unusual filename
Alternatively, use the Terminal app:
cd ~/your-project-folder
nano .envType your key, then press Control+O to save and Control+X to exit.
On Windows:
- Open Notepad
- Type your key:
OPENAI_API_KEY=sk-your-key-here - Go to File → Save As
- In the Save as type dropdown, choose All Files (*.*)
- Set the filename to
.envand navigate to your project folder - Click Save
11.7.3.3 Reading the .env File in Python
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")load_dotenv() reads your .env file and makes its values available as environment variables. os.getenv() retrieves them by name. After this, the OpenAI client will find the key automatically.
11.7.3.4 Keeping the Key Out of Git
If you use git and GitHub, add .env to your .gitignore file so it never gets pushed to a public repository. In PyCharm’s Terminal:
echo ".env" >> .gitignoreOr open .gitignore in PyCharm and add .env on its own line.
11.7.4 Your First AI Call
#| eval: false
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI() # automatically reads OPENAI_API_KEY from your .env
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "What are the top five most streamed artists on Spotify?"}
]
)
print(response.choices[0].message.content)The messages parameter is a list of dictionaries. Each message has a role:
"user"— your message to the AI"assistant"— the AI’s previous replies"system"— instructions about how the AI should behave (not shown to the user)
11.7.5 The System Prompt
The "system" role is how you customize the AI’s persona and behavior before the conversation begins:
#| eval: false
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a media analytics assistant. Answer questions in the context of journalism, communications, and digital media."
},
{
"role": "user",
"content": "What does engagement rate mean?"
}
]
)
print(response.choices[0].message.content)11.7.6 Maintaining a Conversation
The API is stateless — it has no memory between separate calls. To maintain a multi-turn conversation, you send the full history of messages with every request:
#| eval: false
chat_history = [
{"role": "system", "content": "You are a helpful media analytics assistant."}
]
# Turn 1
chat_history.append({"role": "user", "content": "What is sentiment analysis?"})
response = client.chat.completions.create(model="gpt-4o", messages=chat_history)
reply = response.choices[0].message.content
chat_history.append({"role": "assistant", "content": reply})
print(reply)
# Turn 2 — the model "remembers" the previous exchange because we sent the full history
chat_history.append({"role": "user", "content": "Can you give me a Python example of that?"})
response = client.chat.completions.create(model="gpt-4o", messages=chat_history)
reply = response.choices[0].message.content
chat_history.append({"role": "assistant", "content": reply})
print(reply)This pattern — appending each exchange and re-sending the full list — is the core mechanic behind every chatbot, regardless of how they’re presented to users.
11.8 Building an AI App with chat_with_memory
We’ve seen how to call an AI API from a script. Now let’s turn that into a web application that runs in a browser and lets anyone have a conversation with an AI.
The application we’ll build uses:
- Quart — a lightweight Python web framework (similar to Flask, but supports the async API calls that OpenAI requires)
- Session storage — stores the conversation history in the user’s browser between requests, giving the AI “memory”
- Jinja2 templates — HTML files with placeholders that Quart fills in with real data before sending to the browser
11.8.1 Install Required Packages
In PyCharm’s Terminal:
python -m pip install quart openai python-dotenv11.8.2 Project Structure
Your project folder should look exactly like this:
your-project-folder/
├── chat_with_memory.py
├── .env
└── templates/
└── index.html
The templates/ folder is not optional — Quart (and Flask) require that HTML templates live in a subfolder with exactly that name.
11.8.2.1 Creating the templates folder in PyCharm
- In the Project panel, right-click your project folder
- Choose New → Directory
- Name it
templates(all lowercase) and press Enter
11.8.2.2 Creating index.html in PyCharm
- Right-click the
templatesfolder - Choose New → HTML File
- Name it
indexand press Enter
PyCharm will create templates/index.html with some starter HTML. Replace all of it with the following:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Chat</title>
<style>
body {
font-family: Verdana, sans-serif;
max-width: 700px;
margin: 40px auto;
padding: 0 20px;
}
textarea {
width: 100%;
font-size: 1em;
padding: 8px;
}
button {
margin-top: 10px;
padding: 10px 20px;
font-size: 1em;
}
#response-box {
margin-top: 20px;
padding: 16px;
background: #f4f4f4;
border-radius: 6px;
white-space: pre-wrap;
}
.spinner {
display: none;
margin: 10px 0;
font-style: italic;
color: #888;
}
</style>
</head>
<body>
<h1>Ask Me Anything</h1>
<form action="/chat" method="post" id="chat-form">
<textarea name="user_input" rows="4" placeholder="Ask something..." required></textarea>
<br>
<button type="submit" id="submit-btn">Send</button>
</form>
<p class="spinner" id="loading">Thinking...</p>
<div id="response-box">
<strong>Response:</strong><br><br>
{{ assistant_reply | safe }}
</div>
<script>
document.getElementById('chat-form').addEventListener('submit', function() {
document.getElementById('loading').style.display = 'block';
document.getElementById('submit-btn').disabled = true;
});
</script>
</body>
</html>The { assistant_reply | safe } placeholder is Jinja2 syntax — Quart replaces it with the actual AI response before sending the page to the browser. The | safe filter allows any HTML in the response to render properly.
11.8.3 The Application: chat_with_memory.py
Create a new Python file in your project folder called chat_with_memory.py and add the following:
#| eval: false
from quart import Quart, render_template, request, session
from dotenv import load_dotenv
from openai import AsyncOpenAI
load_dotenv()
client = AsyncOpenAI()
app = Quart(__name__)
app.secret_key = "change-this-to-something-unique"
@app.route('/', methods=['GET'])
async def index():
return await render_template('index.html', assistant_reply="")
@app.route('/chat', methods=['POST'])
async def chat():
try:
form_data = await request.form
query = form_data['user_input']
# Get the conversation history from the session, or start a fresh one
chat_history = session.get('chat_history', [])
# Remove old system messages so we always have a fresh one at the top
chat_history = [m for m in chat_history if m["role"] != "system"]
chat_history.insert(0, {"role": "system", "content": "You are a helpful assistant."})
# Add the user's new message
chat_history.append({"role": "user", "content": query})
# Call the OpenAI API with the full conversation history
response = await client.chat.completions.create(
model="gpt-4o",
messages=chat_history
)
assistant_response = response.choices[0].message.content
chat_history.append({"role": "assistant", "content": assistant_response})
# Save the updated history back to the session
session['chat_history'] = chat_history
return await render_template('index.html', assistant_reply=assistant_response)
except Exception as e:
app.logger.error(f"Error: {e}")
return await render_template("index.html", assistant_reply="Something went wrong. Please try again.")
if __name__ == "__main__":
app.run(debug=True, port=8080)11.8.4 Walking Through the Code
load_dotenv() reads your .env file so the API key is available without being written into the code.
AsyncOpenAI() creates an OpenAI client that uses asynchronous calls. The async/await keywords throughout the file tell Python: “while we’re waiting for OpenAI to respond, the web server can handle other tasks.” This is what Quart requires over regular Flask.
app.secret_key is used by Quart to encrypt the session data stored in the user’s browser cookie. Change this string to something unique in any real deployment.
Routes are how a web application responds to different URLs. The @app.route('/') decorator connects the homepage to the index() function, which just renders the empty chat form. The @app.route('/chat') route handles form submissions.
session is a dictionary that Quart stores as an encrypted cookie in the user’s browser. It persists between page loads, which is exactly how the app ‘remembers’ the conversation. Each time the user submits a message, we retrieve the history, add the new exchange, and save it back.
11.8.5 Running the App
Make sure your .env file is in the project folder, then open PyCharm’s Terminal and run:
python chat_with_memory.pyYou’ll see output like:
Running on http://127.0.0.1:8080
Open http://127.0.0.1:8080 in your browser. Type a question, click Send, and the AI will respond. Ask a follow-up — it will remember what was said before.
To stop the app, click in the Terminal and press Control+C.
Why port 8080? Port 5000 is used by Apple’s AirPlay Receiver service on modern Macs, which causes a conflict. Port 8080 is a standard alternative that works on both Mac and Windows without any conflicts.
11.8.6 Customizing the App
Change the AI’s persona by editing the system prompt inside chat_with_memory.py:
system_prompt = """You are a media analytics tutor specializing in
digital journalism and audience measurement. Give clear, practical
examples using real platforms like Instagram, YouTube, and Spotify."""Reset the conversation: Because history is stored in the browser session, closing and reopening the browser tab will start a fresh conversation. You can also add a “Clear” button that calls a route to delete session['chat_history'].
11.8.7 Next Steps
From here, natural extensions include:
- Streaming responses — display the AI’s text word-by-word as it’s generated instead of waiting for the full response. The OpenAI Python SDK supports this natively with
stream=True. - A persistent database — the session cookie has a size limit. A database (SQLite, PostgreSQL) lets users return to previous conversations.
- Deploying online — platforms like Render, Railway, or Heroku can host a Quart app for free or low cost, making your app accessible to anyone, not just your own computer.