Skip to content

supriya-premkumar/RESTfulDataAggregator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Description:

This project implements a RESTful data aggregator. We upload a file with a large number of records 64MM either as a csv or a json file. We specify a column to group on, a column to aggregate. We return the aggregated data in either csv or json.

RESTful Endpoints:

POST /api/v1/upload/ The endpoint to upload a large file. (Either csv or json) POST api/v1/aggregate/ The endpoint to perform aggregation on the previously uploaded file.

Usage Example:

  1. Upload File:

     Request:    curl -i -F file=@ -F "format=" /api/v1/upload/
     Response:   {"status":"Accepted",
                  "url":"/upload/666b3b22-f161-11e5-9670-060c1144530b",
                  "token":"666b3b22-f161-11e5-9670-060c1144530b"}

On successful upload the server returns a token back to the client. The client has to send a token to the server when it wants to perform aggregation

  1. Aggregate File:

  Request:   curl -d "token=666b3b22-f161-11e5-9670-060c1144530b&aggOn=count&grpOn=last_name&outType=csv" /api/v1/aggregate/
  Response:  Either csv or json aggregated stream file download.
  

The client passes the token, groupOn, AggregateOn parameters and Type to indicate the format it expects the results back in.

  Sample File:
  ------------
  first_name,last_name,count
Luke,Skywalker,42
Leia,Skywalker,10
Anakin,Skywalker,20
Admiral,Ackbar,10
Admiral,Tharwn,10
Kylo,Ren,100
Command: ------- groupOn=last_name , aggregateOn=count Output: ------- Skywalker:72
Ackbar:10
Thrawn:10
Ren:100

File Description:

All the functionality can be found in: RESTfulDataAggregator/aggregator/api/views.py Routing rules are in: RESTfulDataAggregator/aggregator/aggregator/urls.py

Utils:

You can also find utils folder which has test data generation scripts.

Sample Use:

python RESTfulDataAggregator/utils/TestDataGenerator/dataGenerator.py --fileType csv --fileSize 10 

We specify the file type and the size of the test data file in MB. The script generates first_name,last_name,count csv file of the required size by randomizing the data from

RESTfulDataAggregator/utils/TestDataGenerator/firstNames.in and RESTfulDataAggregator/utils/TestDataGenerator/lastNames.in

You can find the output of this script at: RESTfulDataAggregator/utils/DataSource/TestData.csv

For a 1GB test data we have a whopping 64MM records and the aggregation happens in about 30 seconds on an AWS EC2 Instance

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages