Automating file Management with Python and RegEx.

Skills and Tools:

  • Python
  • Regular Expressions
  • Data Mining
  • PDF Manipulation
  • Process Automation
  • OS Library
  • GitHub
  • Business Process

Overview

Automating repetitive and time-consuming tasks can significantly boost productivity and efficiency. Manually organizing, renaming, and sorting PDF files can be tedious and error-prone. This project aims to streamline this process by providing a Python script that automates the organization of PDF files based on their content. The script not only saves time but also ensures consistency and accuracy in file management.

The script works by identifying files with the .pdf extension in specified folders. It then reads the content of each file, searching for specific patterns using regular expressions. Based on these patterns, the script applies predefined naming conventions, renames the files accordingly, and moves them to the destination folder.

This type of project is excellent for sharpening and broadening Python skills beyond mainstream applications. It is part of an initiative to eliminate the most disliked tasks in my work routine. Mastering Regular Expressions is particularly useful for data mining activities.

It is really fun trying things out to find regular exrpressions that can find specific patterns in the texts. The resulting code can get a little scary at a first glance but onde you master regular expressions thing starts to make sense. I’ll share a code snippet below.

 

Features

 
  • PDF Inspection: Scans PDF files in the DOWNLOADS and DESKTOP folders.
  • Content-Based Renaming: Renames PDF files according to specific text content rules.
  • File Organization: Moves renamed files to the destination folder.
  • Duplicate Management: Deletes older duplicates of the same file type.
  • Logging: Generates logs for all actions performed.
  • Time Calculation: Calculates the time saved through automation based on the logs.

Requirements:

 

  • Python 3.x
  • pdfminer.six library
  • re (regular expressions) library
  • os library

Usage

 

  • Set Working Directory: The script starts by setting the working directory to the DESKTOP.
  • Specify Destination Folder: The destination folder for the organized PDF files is specified.
  • Scan for PDFs: The script scans for PDF files in the specified folders.
  • Inspect and Rename: Each PDF is inspected for specific text patterns and renamed accordingly.
  • Move Files: Renamed files are moved to the designated folder.
  • Manage Duplicates: The script checks for duplicate files and deletes the older version.
  • Logging and Time Calculation: All actions are logged, and the time saved by automation is calculated.

More projects