Build API that converts PDF into CSV - Upwork
The description is in this Google Docs https://docs.google.com/document/d/1bK0_L597HJIRxu-wRC8adkf_XBNIXBTZE2dA7glc-gU Boletin Judicial PDF to CSV UpWork Project Introduction The overall goal is to build an API that can convert a large PDF into a CSV. This API will eventually be run on a daily basis but for this project, we will only focus on doing it once. There is a sample PDF with 176 pages and it has various sections. Each section is to be broken down into CSV format according to the structure in this spreadsheet. About the PDF This PDF is a daily ‘newspaper’ published by the judicial power in Mexico City. It is used to notify any updates on cases to the broader public. Example: You want to get divorced so you go to court against your spouse. You will visit the court physically several times and after every visit, the judge will publish an update on the Boletin Judicial. This update will have the following structure Analogy: This PDF is an analogy to Facebook notifications Boletin Judicial PDF Facebook Notification “The judge has an update regarding your case” “Your friend has a message for you” Message [Not available in sample PDF] [what you see after clicking on the notification] Required Experienced RESTful API Development Node JS Express JS Regular Expressions Ideas on How to Solve Every case has a number which in Spanish is “Número de Expediente,” but you will see the short version: “Núm. Exp.” This helps to find the case number with the regular expression below. Regular Expressions Use Case Example Regex Número de Expediente Núm. Exp. 61/2015. Núm. Exp. 549/2022. Núm. Exp. 1116/2022. Logic An idea on how to solve it is to scan the entire PDF looking for all the “Núm. Exp” with a regular expression. Then save the content in between to its corresponding case. Resources All relevant resources will be placed in this folder. Specifically, the resources are: Boletín Judicial - Sample PDF - this is the actual publishing of Mexico City’s Court System notifications Boletín Judicial -Visual Breakdown Slides - use these Google Slides to understand how the PDF is structured. Boletín Judicial - CSV Sample Output - This is the desired sample output. Your deliverable should have the exact structure as in this article. Deliverables These are the expected deliverables: Google Sheet Output This sheet will be very similar to the Boletín Judicial - CSV Sample Output mentioned above. Must have visibility open to at least ‘Anyone can comment’ API Built with Node JS and Express JS. GitHub project shared with me @noeldelgadom Hosted somewhere free of your choice GET request ready to test where the return value will be the output CSV Timeline This project should take 2 weeks at most. Future Work This is the first project of what could become a larger project. In fact, the original PDF has 457 pages with even more sections. The ultimate goal is to transcribe that PDF. This first project with the 176 page PDF will serve as an evaluation to determine if we will continue with the 457 page PDF. Let me know if you have any questions.Budget: $250 Posted On: March 28, 2023 01:00 UTCCategory: Scripting & AutomationSkills:ExpressJS, Node.js, RESTful API, API Country: Mexico click to apply
Daftar Isi
The description is in this Google Docs
https://docs.google.com/document/d/1bK0_L597HJIRxu-wRC8adkf_XBNIXBTZE2dA7glc-gU
Boletin Judicial PDF to CSV
UpWork Project
Introduction
The overall goal is to build an API that can convert a large PDF into a CSV. This API will eventually be run on a daily basis but for this project, we will only focus on doing it once.
There is a sample PDF with 176 pages and it has various sections. Each section is to be broken down into CSV format according to the structure in this spreadsheet.
About the PDF
This PDF is a daily ‘newspaper’ published by the judicial power in Mexico City. It is used to notify any updates on cases to the broader public.
Example: You want to get divorced so you go to court against your spouse. You will visit the court physically several times and after every visit, the judge will publish an update on the Boletin Judicial. This update will have the following structure
Analogy: This PDF is an analogy to Facebook notifications
Boletin Judicial PDF
Notification
“The judge has an update regarding your case”
“Your friend has a message for you”
Message
[Not available in sample PDF]
[what you see after clicking on the notification]
Required Experienced
RESTful API Development
Node JS
Express JS
Regular Expressions
Ideas on How to Solve
Every case has a number which in Spanish is “Número de Expediente,” but you will see the short version: “Núm. Exp.” This helps to find the case number with the regular expression below.
Regular Expressions
Use Case
Example
Regex
Número de Expediente
Núm. Exp. 61/2015.
Núm. Exp. 549/2022.
Núm. Exp. 1116/2022.
Logic
An idea on how to solve it is to scan the entire PDF looking for all the “Núm. Exp” with a regular expression. Then save the content in between to its corresponding case.
Resources
All relevant resources will be placed in this folder.
Specifically, the resources are:
Boletín Judicial - Sample PDF - this is the actual publishing of Mexico City’s Court System notifications
Boletín Judicial -Visual Breakdown Slides - use these Google Slides to understand how the PDF is structured.
Boletín Judicial - CSV Sample Output - This is the desired sample output. Your deliverable should have the exact structure as in this article.
Deliverables
These are the expected deliverables:
Google Sheet Output
This sheet will be very similar to the Boletín Judicial - CSV Sample Output mentioned above.
Must have visibility open to at least ‘Anyone can comment’
API
Built with Node JS and Express JS.
GitHub project shared with me @noeldelgadom
Hosted somewhere free of your choice
GET request ready to test where the return value will be the output CSV
Timeline
This project should take 2 weeks at most.
Future Work
This is the first project of what could become a larger project. In fact, the original PDF has 457 pages with even more sections. The ultimate goal is to transcribe that PDF.
This first project with the 176 page PDF will serve as an evaluation to determine if we will continue with the 457 page PDF.
Let me know if you have any questions.
Budget: $250
Posted On: March 28, 2023 01:00 UTC
Category: Scripting & Automation
Skills:ExpressJS, Node.js, RESTful API, API
Country: Mexico
click to apply