Skip to content

nesfit/domainradar-colext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DomainRadar Pipeline

This project contains applications that together make up the core process of DomainRadar:

  • the data collectors fetch data on domain names from external sources (such as DNS),
  • the data merger combines the data from all the collectors,
  • the feature extractor computes the feature vector from the data,
  • the classifier pipeline component fetches data from the extractor, passes them to the classifiers (domainradar-clf) and stores the results.

See the DomainRadar Pipeline & Models documentation for detailed information on the individual components, data flow between them and the used models.

Architecture

The pipeline is constituted by a series of lightweight applications that perform a consume-process-produce cycle; the output of a certain collector is the input of another. This way, each pipeline component can be deployed separately and run in one or multiple instances to distribute the workload (limited by the number of partitions configured in Kafka for the source topics). At the end of the pipeline are the merger components that combine results from the collectors.

The components are implemented using several frameworks:

Running

The Java-based collectors

The collectors based on Parallel Consumer are executed using a common standalone runner. Several different collectors may be started from a single runner instance. In this case, the collectors are totally independent: their consumer group ID is formed as "[provided app ID]-[collector identifier]". All instances of a given collector must be started with the same app ID; but it doesn't matter what collectors are started inside a single runner instance.

The Faust-based collectors

The Faust-based components do not have a shared runner and each must be started separately. It still holds that the same app ID must be used for all running instances of a single component. Refer to the python_pipeline/README.md file for more information.

The data merger

The data merger is executed using the Streams runner. You must use the same app ID for all runner instances that contain this component. You can build and start the merger using the following commands:

cd java_pipeline
mvn package -pl streams-components -am
java -cp "streams-components/target/streams-components-1.0.0-SNAPSHOT-jar-with-dependencies.jar" "cz.vut.fit.domainradar.streams.StreamsPipelineRunner" --merger -id domainradar-merger -p "[client config file].properties" -s kafka:9092
# use -Dlog4j2.configurationFile=file:[config file].xml to override the logger configuration

About

DomainRadar collector & extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •