In this task, you must find the 10 products with the greatest price variation, between the value found in online retail (Americanas) and that stipulated by the manufacturer, and the 10 products that have the greatest unavailability. With this task, I want to see if there are products that are being sold with values that are far from ideal. For this, I'll provide data collected at retailers on different dates so that you can develop this activity. Some important points about the structure of this data:
"retailerPrice" -- is the retail price;
"manufacturerPrice" -- is the price indicated by the manufacturer;
"priceVariation" -- is the price variation, it can be negative or positive.
“available” – whether the product is available for purchase
The scope of testing must include, but not be limited to:
Data ingestion;
File storage;
Data processing;
Calculation of metrics;
Also:
PyLint application or similar;
Application of Clean Code;
Pipeline diagram;
Readme;
Coverage and Unit Testing;
I'll be providing the files for the task.
Posted On: June 07, 2023 02:23 UTC
Category: Data Analytics
Skills:Python, PySpark, Apache Spark, ETL Pipeline
Country: Brazil
click to apply
"retailerPrice" -- is the retail price;
"manufacturerPrice" -- is the price indicated by the manufacturer;
"priceVariation" -- is the price variation, it can be negative or positive.
“available” – whether the product is available for purchase
The scope of testing must include, but not be limited to:
Data ingestion;
File storage;
Data processing;
Calculation of metrics;
Also:
PyLint application or similar;
Application of Clean Code;
Pipeline diagram;
Readme;
Coverage and Unit Testing;
I'll be providing the files for the task.
Posted On: June 07, 2023 02:23 UTC
Category: Data Analytics
Skills:Python, PySpark, Apache Spark, ETL Pipeline
Country: Brazil
click to apply