iitm_scraper / markdown_files /Data_Analysis_with_DuckDB.md
Shriyakupp's picture
Upload 107 files
980dc8d verified
metadata
title: Data Analysis with DuckDB
original_url: >-
  https://tds.s-anand.net/#/data-analysis-with-duckdb?id=data-analysis-with-duckdb
downloaded_at: '2025-06-08T23:26:27.065997'

Data Analysis with DuckDB

Data Analysis with DuckDB

You’ll learn how to perform data analysis using DuckDB and Pandas, covering:

  • Parquet for Data Storage: Understand why Parquet is a faster, more compact, and better-typed storage format compared to CSV, JSON, and SQLite.
  • DuckDB Setup: Learn how to install and set up DuckDB, along with integrating it into a Jupyter notebook environment.
  • File Format Comparisons: Compare file formats by speed and size, observing the performance difference between saving and loading data in CSV, JSON, SQLite, and Parquet.
  • Faster Queries with DuckDB: Learn how DuckDB uses parallel processing, columnar storage, and on-disk operations to outperform Pandas in speed and memory efficiency.
  • SQL Query Execution in DuckDB: Run SQL queries directly on Parquet files and Pandas DataFrames to compute metrics such as the number of unique flight routes delayed by certain time intervals.
  • Memory Efficiency: Understand how DuckDB performs analytics without loading entire datasets into memory, making it highly efficient for large-scale data analysis.
  • Mixing DuckDB and Pandas: Learn to interleave DuckDB and Pandas operations, leveraging the strengths of both tools to perform complex queries like correlations and aggregations.
  • Ranking and Filtering Data: Use SQL and Pandas to rank arrival delays by distance and extract key insights, such as the earliest flight arrival for each route.
  • Joining Data: Create a cost analysis by joining datasets and calculating total costs of flight delays, demonstrating DuckDB’s speed in joining and aggregating large datasets.

Here are the links used in the video:

[Previous

Data Analysis with Datasette](#/data-analysis-with-datasette)

[Next

Data Analysis with ChatGPT](#/data-analysis-with-chatgpt)