Description:

This project implements a RESTful data aggregator. We upload a file with a large number of records 64MM either as a csv or a json file. We specify a column to group on, a column to aggregate. We return the aggregated data in either csv or json.

RESTful Endpoints:

POST /api/v1/upload/ The endpoint to upload a large file. (Either csv or json) POST api/v1/aggregate/ The endpoint to perform aggregation on the previously uploaded file.

Usage Example:

Upload File:

     Request:    curl -i -F file=@ -F "format=" /api/v1/upload/
     Response:   {"status":"Accepted",
                  "url":"/upload/666b3b22-f161-11e5-9670-060c1144530b",
                  "token":"666b3b22-f161-11e5-9670-060c1144530b"}

On successful upload the server returns a token back to the client. The client has to send a token to the server when it wants to perform aggregation

Aggregate File:

  Request:   curl -d "token=666b3b22-f161-11e5-9670-060c1144530b&aggOn=count&grpOn=last_name&outType=csv" /api/v1/aggregate/
  Response:  Either csv or json aggregated stream file download.

The client passes the token, groupOn, AggregateOn parameters and Type to indicate the format it expects the results back in.

  Sample File:
  ------------
  first_name,last_name,count

  Luke,Skywalker,42

  Leia,Skywalker,10

  Anakin,Skywalker,20

  Admiral,Ackbar,10

  Admiral,Tharwn,10

  Kylo,Ren,100

  
  Command:
  -------
  groupOn=last_name , aggregateOn=count
  
  Output:
  -------
  Skywalker:72

  Ackbar:10

  Thrawn:10

  Ren:100

File Description:

All the functionality can be found in: RESTfulDataAggregator/aggregator/api/views.py Routing rules are in: RESTfulDataAggregator/aggregator/aggregator/urls.py

Utils:

You can also find utils folder which has test data generation scripts.

Sample Use:

python RESTfulDataAggregator/utils/TestDataGenerator/dataGenerator.py --fileType csv --fileSize 10

We specify the file type and the size of the test data file in MB. The script generates first_name,last_name,count csv file of the required size by randomizing the data from

RESTfulDataAggregator/utils/TestDataGenerator/firstNames.in and RESTfulDataAggregator/utils/TestDataGenerator/lastNames.in

You can find the output of this script at: RESTfulDataAggregator/utils/DataSource/TestData.csv

For a 1GB test data we have a whopping 64MM records and the aggregation happens in about 30 seconds on an AWS EC2 Instance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description:

RESTful Endpoints:

Usage Example:

File Description:

Utils:

Sample Use:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
aggregator		aggregator
utils		utils
README.md		README.md

supriya-premkumar/RESTfulDataAggregator

Folders and files

Latest commit

History

Repository files navigation

Description:

RESTful Endpoints:

Usage Example:

File Description:

Utils:

Sample Use:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages