1. Use AWS Glue to create a  Data Ingestion pipeline for three files (project_transaction, product and demographic). Using parquet option and store in three tables and build a small data model

Create the following data reports:
For each product, what time period the sales value is maximum
Max quantity sold in a single row. Inspect the row as well. Does this have a high discount percentage?
Total sales value per basket (sum of sales value / nunique basket_id).
Total sales value per household (sum of sales value / nunique household_key).
Which products had the most sales by sales_value? Plot a horizontal bar chart.
Did the top 10 selling items have a higher than average discount rate?
What was the most common PRODUCT_ID among rows with the households in our top 10 households by sales value?
Look up the names of the top 10 products by sales in the products.csv dataset.
Look up the product name of the item that had the highest quantity sold in a single row.
3. Do regression analysis for each product: sales vs rebate

Budget: $150
Posted On: June 21, 2023 00:51 UTC
Category: Data Analytics
Skills:AWS Glue, PySpark, Data Engineering, Machine Learning, Amazon SageMaker
Country: United States
click to apply