Hi there! I am wanting to build an automated data feed of forward-looking, daily flight price data for the airline, Qantas from the Qantas website (https://www.qantas.com/us/en/book-a-trip/flights.html).

On each day I want to search the first 5 morning flights of the day from 12 routes (SYD-MEL, SYD-BNE, SYD-CBR, MEL-SYD, MEL-BNE, MEL-CBR, BNE-SYD, BNE-MEL, BNE-CBR, CBR-SYD, CBR-MEL, CBR-BNE) and return the Business, Flex and Economy class airfares.

The tracking time period would be from 90 days to departure until the day prior to departure, and I would like to collect a full set of data over a rolling 60 day period (total 150 days). For example:
On day 0 (t), extract prices of day 90 (t+90).
On day 1 (t+1), extract prices of days t+90 and t+91.
On day t+60, extract prices of days t+90 to t+150.
On day t+90, extract prices of days t+91 to t+150.
On day t+91, extract prices of days t+92 to t+150.
On day t+149, extract prices of day t+150.

If each flight on each day represented a row (with the three class airfares) there would be 5*12*90*60 = 324,000 rows of data at the end of the collection period. Each day from 1 to 60 will there will be an extra 60 rows of data each day. On days 60-90 there will be 3600 rows. On days 91-150 there will be 60 fewer rows of data each day.

I have attached an example of what I hope the extract could look like, as well as a visual of the number of rows for each day over the 150 day period.

Please let me know if this is something you can help with.

Thanks,
Julian


Posted On: March 01, 2023 05:23 UTC
Category: Data Mining
Skills:Automation, Selenium, Python Script, Scripting, Web Crawling, Data Extraction, Data Scraping, Data Mining, Python, Lead Generation, Scrapy
Country: United States
click to apply