Skip to content

Latest commit

 

History

History
258 lines (227 loc) · 11.8 KB

TODO.md

File metadata and controls

258 lines (227 loc) · 11.8 KB

`# V1

  • output the graph basic
  • graph output prettier
  • combine http and https backends
  • add --max-depth
  • detect service name from /etc/chef/client.rb
  • add nsq graph
  • add --skip-nsq-topics
  • detect defunct children via haproxy stats
  • skip display defunct children
  • haproxy 1.6 and 1.8 compatability
  • add error state for "null" NSQ clients (use "?")
  • detect missing stats socket haproxy
  • task: validate missing haproxy stats socket config is live manually with knife ssh
  • task: validate missing haproxy stats socket is live manually with knife ssh
  • add "no consumer" NSQ detection
  • task: validate DEFUNCT-ness
  • multiple seeds

V2.async

  • use asyncssh (10x faster!)
  • remove return_exceptions=True
  • use global BASTION connection (ronf/asyncssh#270)
  • limit concurrency w/ semaphore
  • split to modules
  • re-use ssh connection for get-name/get-config calls
  • pass lightweight node-ref through async calls instead of node dict
  • remove pending node print
  • deal with formatting/output-ordering implications
  • convert recursive crawl from await to ensure_future
  • improve live output rendering
  • fix introduced parent['last_sibling'] bug
  • bug: cycle is correct in the tree, but rendering zombie children (only for first level cycles?)
  • retry ssh connection 3 times, fine tune concurrency
  • introduced: --output=stdout is now broken due to render_node_live
  • rename water to water_spout, private module function
  • consolidate find..children error checking
  • validate frontend-router
  • move connection semaphore to ssh_layer
  • better trace/debug log levels
  • consolidate nsq node relationships w/ multiple connections
  • deal w/ SSH config: bastion & username
  • refactors from PR review (reduce complexity, procedural styling)

V2.features

  • DISPLAY: output in json
  • DISPLAY: load json file
  • DISPLAY: output in graphviz
  • DISPLAY: graphviz source
  • CRAWL: detect proxysql
  • CRAWL: cassandra
  • CRAWL: detect well known ports w/ netstat & AWS name lookup (cx, memcache, redis)
  • CRAWL: detect postgres well known port - causing trouble w/ name lookup
  • CRAWL: user defined links
  • move hints/skips to web.yaml
  • keep config.yaml
  • CRAWL: kinesis

V2.refactor

  • move grouping of nsq topics to application layer, on service_name instead of IP
  • config_errors -> warnings, crawl_errors -> errors
  • refactor ssh config to ssh config file
  • refactor --hide-defunct to --skip-defunct and do not even (crawl)
  • graphviz warn/error color coding
  • remove "cruft" handling
  • add quick filter to rewrite service_name mysql-main-port_3306 to mysql-main-r/o
  • create objects or named tuples (dataclasses!)
  • PEP8, 120 line length
  • CHARLOTTE: make the get_config function into configurable parsers definable in YAML
  • charlotte: replace 'null' response from NSQ for missing IP w/ actual None response
  • charlotte: move crawl strategy exceptions (frontend-router) into charlotte
  • charlotte: move blocking logic to charlotte
  • charlotte: rename crawl_strategy -> crawl_provider on Node()
  • charlotte: move service_name_rewrite to charlotte
  • rename protocol_detail -> protocol_mux
  • CHARLOTTE: --skip-{name} arguments
  • --skip-defunct -> --hide-defunct
  • refactor database named matching to port matching
  • move skip services from globals to argparse
  • move crawl_complete, name_lookup_complete to node.py
  • charlotte config 1 file to directory of yaml files
  • create default yaml file for argparse
  • rename ip -> instance_address
  • remove crawl strategy object from Node, denormalize (protocol, blocking)
  • merge hints into pre-existing children w/ unknown address
  • CORE: add sub commands for ['crawl', 'render-json']
  • CORE (OSS): unit tests tests tests (round I - excluding provider_*.py and crawl.py)

V2.bugs

  • BUG: nsq channels on same node are not grouping, again!
  • there is a regression in cycle detection - spider against async-cake-handler to repro
  • trim double quotes from service_name
  • BUG: crawl of well known port is discovering random connections to frontend-routers, ELBs - fixed by chris r. source ephemeral port filter
  • 'CYCLE': f"service '{node['service_name']}' discovered as a parent of itself!",
  • paramiko nested exception outputting
  • handle actually null (absent value) nsq consumer in additionn to string literal "null"
  • ascii renderer grouping by detail is persisting in memory (groupings)
  • charlotte: move name parser expections (mysql-main) into charlotte
  • we see many repeating group by service-name NSQ topic/channels repeating in ascii renderer
  • catch timeout for crawling children
  • remove trailing _ from node_ref
  • graphviz blocking is backwards
  • regression defunct in parser check on num_connections == 0 is failing
  • differentiate RDS databases found in AWS - currently all show as rdsnetworkinterface
  • BUG: add type to json serialization - currently brittle: key-ing off of random fields for deserialization
  • infinited recursion bug introduced by the crawl hints. it had to do with the cached_nodes in crawl.py being by_ref object and a deep-ish copy fixed
  • trying to crawl json that was outputted with --depth arg results in hanging wait_for_crawl to complete on nodes

V3 Kubernetes++

  • CRAWL: kubernetes - take a hint
  • CRAWL: kubernetes - name lookup, crawl
  • support EKS cluster in a different AWS account than provider_aws

V3.refactor

  • static code analysis (prospector) and forthcoming changes
  • refactor providers to objects, remove SSH logic from crawl.py
  • caching children in crawl.py instead of providers!!
  • fix TIMEOUT logic
  • put provider_args back in crawl strategies! use **kwargs to pass args in code
  • rewrite provider registration
  • move provider constant refs from constants.py into providers
  • rename errors.NULL_IP NULL_ADDRESS
  • refactor signature of crawl_downstream to include address
  • replace pass through node_ref in crawl w/ zip()
  • unit tests for crawl, providers, provider_*?
  • validate that crawl strategies are only used for specified providers
  • refactor lookup_name to remove life360 business logic from providers!
  • remove ProviderInterface::configure(), have ssh configure itself on first query
  • seed provider is configurable command line arg w/

V3.features

  • FEATURE: make instance_provider args for aws hints part of a refactored "profile"
  • FEATURE: Distinguish kubernetes service shape in graphviz
  • add --stop-on-nonblocking CLI arg

V3.bugs

  • not respecting CrawlStrategy.providers
  • need to be able to configure different AWS profile for k8s/eks than for aws! (for dev)
  • BUG: intermittent timeout exceptions which do not result in program exit

V4.VOSS

  • REFACTOR: (providers): providers as plugin architecture
  • REFACTOR (spider): --concurrency -> --ssh-concurrency OR provider args
  • REFACTOR: (all): refactor package architecture
  • TIMEOUT: (crawl) robust provider timeout and exception handling
  • OBSCURIFIER (render_*): obscurifier for output
  • BUG: fix namespace package not being include in dist

V5.PROMVIZ

  • [~] promviz render output
    • render nsq
    • haproxy http enabled in prod
    • render haproxy
    • render proxysql
    • BUG: geonames orphaned due to no data returned query
    • render haproxy tcp mode
    • render elasticache
    • render kinesis
    • render custom queries
  • merge hints
  • add missing hints
  • render_promviz tests
  • refactor renderers to plugins
  • fix plugin imports
  • refactor/DRY providers/renderes to plugin_core.py
  • how to organize plugin tests?
  • move constants.ARGS to cli_args.ARGS
  • update examples plugins/crawl strategies/docs
  • [~] PLUGINS: BUG namespace plugins aren't pip install --editable-able

V5.1 NICETOHAVES

  • ci/cd run tests
  • ci/cd publish pypy package
  • annotate services w/ links to wiki/github

Backlog

New Features

Core

  • RENDER_PLUGINS: make renderer's an abstract class w/ plugins
  • REFACTOR: move seed logic out of ./spider.py
  • REFACTOR: revisit the Node{Protocol, CrawlStrategy, protocol_mux} object relationship strategy
  • FEATURE: track whether a node was skipped for crawling and display as such in graphviz
  • REFACTOR: move errors/warnings to a global config
  • REFACTOR: do not block crawl() on lookup_name() in main crawl loop. will speed up many times
  • REFACTOR: move mutex from provider_ssh to crawl.py
  • BUG: intermittent timeouts crawling the whole tree - add retry to lookup_name/crawl_downstream?
  • BUG: remove blocking from CrawlStrategy - it should only be in Protocol
  • BUG: where is elasticache-time-points? crawl-netstat only takes 1 ip per port, so for async-soa which has 2 downstreams on 6379, it can't find
  • BUG: where is cx-dvb??
  • REFACTOR: consolidate Node::crawl_complete and crawl.py::_crawlable()
  • BUG: required args showing as optional in --help
  • DOCS: remove non obfuscated example video from README
  • LOGGER: rewrite logger access for community standards
  • PLUGPLAY: out of the box functionality by moving TCP to a "builtin" CrawlStrategy and using hostname or default service name
  • REFACTOR: (providers): rewrite take_a_hint to not return a list, just return a single NodeTransport
  • DOCS: rewrite docs in sphinx style and prepare for export to readthedocs.org
  • FEATURE: a new render format that has a predictable sort order, and on top of that the ability to diff

Renderers

  • test coverage for renderers.py

Remder Ascii

  • FEATURE: merge hints in ascii output

Render Graphviz

  • FEATURE: multiple seeds display with equal ranking
  • FEATURE: nsq topics as nodes rather than edges
  • FEATURE: visualize cycles
  • FEATURE: different visualization for cache vs database
  • FEATURE: create a legend

Render JSON

Render New

  • DISPLAY: output in vizceral format
  • DISPLAY: 'diff' run on multiple seed nodes and diff!

CrawlStrategies

  • BUG: HAProxy: functionality to detect bad HAProxy Config as a crawl error was lost in async refactor if stdout.startswith('ERROR:'): return 'CRAWL ' + stdout.replace("\n","\t"), {}
  • BUG: NSQ: misconfigured clients have null server (this is why we don't see rattail -> relapse), investigate & resolve
  • FEATURE: Netstat: use matchAddress for HAProxy crawl strategies to avoid timeout to RDS hostnames
  • FEATURE: crawl downstream - ability to specify more providers args per provider (so that k8s can selectively crawl containers)
  • FEATURE: detect multiple downstream on same port with NetstatCrawlStrategy - it will only pick up the first

Providers

  • BUG: cli arg --disable-providers is broken

Provider SSH

  • FEATURE: revisit whether occupy_one_sempahore_space is working (to dynamically configure --concurrency)
  • FEATURE: still getting ssh connections errors sometimes with out --concurrency=10
  • FEATURE: configurable "~/.ssh/config" SSH profile
  • REFACTOR (provider_ssh): we shouldn't use known_hosts=None for security reasons
  • TEST: write tests for provider_ssh

Provider AWS

  • FEATURE: lookup_name is slow, use async
  • CRAWL: dynamodb
  • CRAWL: SQS
  • TEST: write tests for provider_aws

Provider K8S

  • TEST: write tests for provider_k8s

Charlotte

  • FEATURE (charlotte): yaml validation by schema

Web

Trash Can

  • backwards compatability for haproxy w/out stats socket
  • detect live traffic netstat/tcpdump/ebpf? (this was solved by using haproxy stats)
  • remove crawl_strategy from Node()