Spaces:
Sleeping
Sleeping
| title: "Cleaning Data with OpenRefine" | |
| original_url: "https://tds.s-anand.net/#/cleaning-data-with-openrefine?id=cleaning-data-with-openrefine" | |
| downloaded_at: "2025-06-08T23:26:48.911609" | |
| [Cleaning Data with OpenRefine](#/cleaning-data-with-openrefine?id=cleaning-data-with-openrefine) | |
| ------------------------------------------------------------------------------------------------- | |
| [](https://youtu.be/zxEtfHseE84) | |
| This session covers the use of OpenRefine for data cleaning, focusing on resolving entity discrepancies: | |
| * **Data Upload and Project Creation**: Import data into OpenRefine and create a new project for analysis. | |
| * **Faceting Data**: Use text facets to group similar entries and identify frequency of address crumbs. | |
| * **Clustering Methodology**: Apply clustering algorithms to merge similar entries with minor differences, such as punctuation. | |
| * **Manual and Automated Clustering**: Learn to merge clusters manually or in one go, trusting the system’s clustering accuracy. | |
| * **Entity Resolution**: Clean and save the data by resolving multiple versions of the same entity using Open Refine. | |
| Here are links used in the video: | |
| * [OpenRefine software](https://openrefine.org) | |
| * [Dataset for OpenRefine](https://drive.google.com/file/d/1ccu0Xxk8UJUa2Dz4lihmvzhLjvPy42Ai/view) | |
| [Previous | |
| Data Preparation in the Editor](#/data-preparation-in-the-editor) | |
| [Next | |
| Profiling Data with Python](#/profiling-data-with-python) |