iitm_scraper / markdown_files /Data_Analysis_with_SQL.md
Shriyakupp's picture
Upload 107 files
980dc8d verified
---
title: "Data Analysis with SQL"
original_url: "https://tds.s-anand.net/#/data-analysis-with-sql?id=data-analysis-with-sql"
downloaded_at: "2025-06-08T23:22:33.461136"
---
[Data Analysis with SQL](#/data-analysis-with-sql?id=data-analysis-with-sql)
----------------------------------------------------------------------------
[![Data Analysis with Databases](https://i.ytimg.com/vi_webp/Xn3QkYrThbI/sddefault.webp)](https://youtu.be/Xn3QkYrThbI)
You’ll learn how to perform data analysis using SQL (via Python), covering:
* **Database Connection**: How to connect to a MySQL database using SQLAlchemy and Pandas.
* **SQL Queries**: Execute SQL queries directly from a Python environment to retrieve and analyze data.
* **Counting Rows**: Use SQL to count the number of rows in a table.
* **User Activity Analysis**: Query and identify top users by post count.
* **Post Concentration**: Determine if a small percentage of users contribute the majority of posts using SQL aggregation.
* **Correlation Calculation**: Calculate the Pearson correlation coefficient between user attributes such as age and reputation.
* **Regression Analysis**: Compute the regression slope to understand the relationship between views and reputation.
* **Handling Large Data**: Perform calculations on large datasets by fetching aggregated values from the database rather than entire datasets.
* **Statistical Analysis in SQL**: Use SQL as a tool for statistical analysis, demonstrating its power beyond simple data retrieval.
* **Leveraging AI**: Use ChatGPT to generate SQL queries and Python code, enhancing productivity and accuracy.
Here are the links used in the video:
* [Data analysis with databases - Notebook](https://colab.research.google.com/drive/1j_5AsWdf0SwVHVgfbEAcg7vYguKUN41o)
* [SQLZoo](https://www.sqlzoo.net/wiki/SQL_Tutorial) has simple interactive tutorials to learn SQL
* [Stats database](https://relational-data.org/dataset/Stats) that has an anonymized dump of [stats.stackexchange.com](https://stats.stackexchange.com/)
* [Pandas `read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)
* [SQLAlchemy docs](https://docs.sqlalchemy.org/)
[Previous
Data Analysis with Python](#/data-analysis-with-python)
[Next
Data Analysis with Datasette](#/data-analysis-with-datasette)