Energy management company requires PDF data scrapping automation.

Basically, as an energy broker our company goes to market to request pricing for our client future energy contracts. We will receive offers from up to 15 different retailers.

These offers are emailed to us in the form of PDF contracts with different terms (12,24,36,48 months etc) and pricing per kWH in cents. There are also environmental costs and other details of the offers that are sent also.

What I need done is once these offers have been received and the tender is closed to put these PDFs in a folder and for a program to go through this folder and extract the relevant data from the PDFs and place the scrapped data into a excel spread sheet.

There are a few variables.
1. Not all retailers offer the same term lengths.
2. Cost breakdowns vary from state to state (Australian States).

The layouts do vary from retailer to retailer, so it isn’t a uniform layout.

89 fields will be needed in the Excel sheet.

Each offer will need to be on a different row so maximum 15 rows. 87 field/column headers. Sample below of data and format to be scrapped in.

Assuming best to use PyPDF2, PDF plumber.

Difficulty is that different scripts will be needed depending on the retailer.

Tasks. Set up an automation to.

1. Identify the retailer the offer has been received from.

2. Scrap the required data from that specific retailer.

3. Go through folder until all offers/PDFs scrapped.

4. Compile data into excel sheet.

There are a few rules that will need to be included ensure accuracy.

Attached data fields and sample offer for review.



Budget: $800
Posted On: October 02, 2023 01:12 UTC
Category: Data Extraction
Skills:Python, Data Scraping, Automation
Country: Australia
click to apply