Skip to content

Commit

Permalink
update README, import TODO, update .gitignore
Browse files Browse the repository at this point in the history
  • Loading branch information
Patrick Kennel committed Jul 2, 2020
1 parent b0de5a6 commit 5a2a263
Show file tree
Hide file tree
Showing 3 changed files with 221 additions and 8 deletions.
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -140,5 +140,6 @@ cython_debug/
# IDEs
.idea

# outputs
outputs/
# itsybitsy
outputs/
.lastrun.json
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,11 @@ Configure charlotte, give it a seed node, and it crawls the graph/tree of your s
* python >= 3.8 was chosen in order to use unittest.mock AsyncMock
* dot/graphviz binaries installed in system PATH (e.g. `brew install graphviz`)


## Configure itsybitsy in 8 easy steps!
1. Clone itsybitsy
1. `git clone git@github.com/life360/itsybitsy`
## Configure itsybitsy in 7 easy steps!
1. Review the example project in [examples/example-project(examples/example-project)]
1. Start a new project / empty folder
1. `mkdir myitsybitsy && cd myitsybitsy`
1. `echo "-e /Users/patrick/repos/itsybitsy" > requirements.txt`
1. `echo "git+ssh://git@github.com/life360/itsybitsy.git#egg=itsybitsy" > requirements.txt`
1. `pip install -r requirements.txt`
1. Configure charlotte - the configuration engine with which you will describe your service graph to itsybitsy
1. `mkdir charlotte.d`
Expand All @@ -31,7 +28,6 @@ Configure charlotte, give it a seed node, and it crawls the graph/tree of your s
1. Hint: `spider.conf` is always inherited, but you can create different profiles such as `spider.prod.conf` and reference them with the `--profile` arg
1. Note: unlike the `spider` command, `render` is written to stand alone and parse the default json file in `outputs/.lastrun.json` it requires no arguments by default.


## Use
#### 1 Run in `spider` mode:

Expand Down
216 changes: 216 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
`# V1

* [x] output the graph basic
* [x] graph output prettier
* [x] combine http and https backends
* [x] add --max-depth
* [x] detect service name from /etc/chef/client.rb
* [x] add nsq graph
* [x] add --skip-nsq-topics
* [x] detect defunct children via haproxy stats
* [x] skip display defunct children
* [x] haproxy 1.6 and 1.8 compatability
* [x] add error state for "null" NSQ clients (use "?")
* [x] detect missing stats socket haproxy
* [x] task: validate missing haproxy stats socket config is live manually with knife ssh
* [x] task: validate missing haproxy stats socket is live manually with knife ssh
* [x] add "no consumer" NSQ detection
* [x] task: validate DEFUNCT-ness
* [x] multiple seeds

# V2.async
* [x] use asyncssh (10x faster!)
* [x] remove `return_exceptions=True`
* [x] use global BASTION connection (https://github.com/ronf/asyncssh/issues/270)
* [x] limit concurrency w/ semaphore
* [x] split to modules
* [x] re-use ssh connection for get-name/get-config calls
* [x] pass lightweight node-ref through async calls instead of node dict
* [x] remove pending node print
* [x] deal with formatting/output-ordering implications
* [x] convert recursive crawl from `await` to `ensure_future`
* [x] improve live output rendering
* [x] fix introduced parent['last_sibling'] bug
* [x] bug: cycle is correct in the tree, but rendering zombie children (only for first level cycles?)
* [x] retry ssh connection 3 times, fine tune concurrency
* [x] introduced: --output=stdout is now broken due to render_node_live
* [x] rename water to water_spout, private module function
* [x] consolidate `find..children` error checking
* [x] validate frontend-router
* [x] move connection semaphore to ssh_layer
* [x] better trace/debug log levels
* [x] consolidate nsq node relationships w/ multiple connections
* [x] deal w/ SSH config: bastion & username
* [x] refactors from PR review (reduce complexity, procedural styling)

# V2.features
* [x] DISPLAY: output in json
* [x] DISPLAY: load json file
* [x] DISPLAY: output in graphviz
* [x] DISPLAY: graphviz source
* [x] CRAWL: detect proxysql
* [x] CRAWL: cassandra
* [x] CRAWL: detect well known ports w/ netstat & AWS name lookup (cx, memcache, redis)
* [x] CRAWL: detect postgres well known port - causing trouble w/ name lookup
* [x] CRAWL: user defined links
* [x] move hints/skips to web.yaml
* [x] keep config.yaml
* [x] CRAWL: kinesis

# V2.refactor
* [x] move grouping of nsq topics to application layer, on service_name instead of IP
* [x] `config_errors` -> `warnings`, `crawl_errors` -> `errors`
* [x] refactor ssh config to ssh config file
* [x] refactor --hide-defunct to --skip-defunct and do not even (crawl)
* [x] graphviz warn/error color coding
* [x] remove "cruft" handling
* [x] add quick filter to rewrite service_name mysql-main-port_3306 to mysql-main-r/o
* [x] create objects or named tuples (dataclasses!)
* [x] PEP8, 120 line length
* [x] CHARLOTTE: make the `get_config` function into configurable parsers definable in YAML
* [x] charlotte: replace 'null' response from NSQ for missing IP w/ actual None response
* [x] charlotte: move crawl strategy exceptions (frontend-router) into charlotte
* [x] charlotte: move blocking logic to charlotte
* [x] charlotte: rename crawl_strategy -> crawl_provider on Node()
* [x] charlotte: move service_name_rewrite to charlotte
* [x] rename protocol_detail -> protocol_mux
* [x] CHARLOTTE: --skip-{name} arguments
* [x] --skip-defunct -> --hide-defunct
* [x] refactor database named matching to port matching
* [x] move skip services from globals to argparse
* [x] move crawl_complete, name_lookup_complete to node.py
* [x] charlotte config 1 file to directory of yaml files
* [x] create default yaml file for argparse
* [x] rename `ip` -> `instance_address`
* [x] remove crawl strategy object from Node, denormalize (protocol, blocking)
* [x] merge hints into pre-existing children w/ unknown address
* [x] CORE: add sub commands for ['crawl', 'render-json']
* [x] CORE (OSS): unit tests tests tests (round I - excluding `provider_*.py` and `crawl.py`)


# V2.bugs
* [x] BUG: nsq channels on same node are not grouping, again!
* [x] there is a regression in cycle detection - spider against async-cake-handler to repro
* [x] trim double quotes from service_name
* [x] BUG: crawl of well known port is discovering random connections to frontend-routers, ELBs - fixed by chris r. source ephemeral port filter
* [x] `'CYCLE': f"service '{node['service_name']}' discovered as a parent of itself!",`
* [x] paramiko nested exception outputting
* [x] handle actually null (absent value) nsq consumer in additionn to string literal "null"
* [x] ascii renderer grouping by detail is persisting in memory (groupings)
* [x] charlotte: move name parser expections (mysql-main) into charlotte
* [x] we see many repeating group by service-name NSQ topic/channels repeating in ascii renderer
* [x] catch timeout for crawling children
* [x] remove trailing `_` from node_ref
* [x] graphviz blocking is backwards
* [x] regression defunct in parser check on num_connections == 0 is failing
* [x] differentiate RDS databases found in AWS - currently all show as `rdsnetworkinterface`
* [x] BUG: add __type__ to json serialization - currently brittle: key-ing off of random fields for deserialization
* [x] infinited recursion bug introduced by the crawl hints. it had to do with the cached_nodes in crawl.py being by_ref object and a deep-ish copy fixed
* [x] trying to crawl json that was outputted with --depth arg results in hanging `wait_for_crawl` to complete on nodes

# V3 Kubernetes++
* [x] CRAWL: kubernetes - take a hint
* [x] CRAWL: kubernetes - name lookup, crawl
* [x] support EKS cluster in a different AWS account than provider_aws


# V3.refactor
* [x] static code analysis (prospector) and forthcoming changes
* [x] refactor providers to objects, remove SSH logic from crawl.py
* [x] caching children in crawl.py instead of providers!!
* [x] fix TIMEOUT logic
* [x] put provider_args back in crawl strategies! use **kwargs to pass args in code
* [x] rewrite provider registration
* [x] move provider constant refs from constants.py into providers
* [x] rename errors.NULL_IP NULL_ADDRESS
* [x] refactor signature of `crawl_downstream` to include address
* [x] replace pass through node_ref in crawl w/ `zip()`
* [x] unit tests for crawl, providers, provider_*?
* [x] validate that crawl strategies are only used for specified providers
* [x] refactor lookup_name to remove life360 business logic from providers!
* [x] remove ProviderInterface::configure(), have ssh configure itself on first query
* [x] seed provider is configurable command line arg w/

# V3.features
* [x] FEATURE: make instance_provider args for aws hints part of a refactored "profile"
* [x] FEATURE: Distinguish kubernetes service shape in graphviz
* [ ] add --stop-on-nonblocking CLI arg

# V3.bugs
* [x] not respecting CrawlStrategy.providers
* [x] need to be able to configure different AWS profile for k8s/eks than for aws! (for dev)
* [x] BUG: intermittent timeout exceptions which do not result in program exit

# V4.VOSS
* [x] REFACTOR: (providers): providers as plugin architecture
* [x] REFACTOR (spider): --concurrency -> --ssh-concurrency OR provider args
* [x] REFACTOR: (all): refactor package architecture
* [x] TIMEOUT: (crawl) robust provider timeout and exception handling
* [x] OBSCURIFIER (render_*): obscurifier for output
* [ ] LOGGER: rewrite logger access for community standards
* [ ] PLUGPLAY: out of the box functionality by moving TCP to a "builtin" CrawlStrategy and using `hostname` or default service name
* [ ] REFACTOR: (providers): rewrite take_a_hint to not return a list, just return a single NodeTransport
* [ ] DOCS: rewrite docs in sphinx style and prepare for export to readthedocs.org

# Backlog

## New Features

## Core
* [ ] RENDER_PLUGINS: make renderer's an abstract class w/ plugins
* [ ] REFACTOR: move seed logic out of ./spider.py
* [ ] REFACTOR: revisit the Node{Protocol, CrawlStrategy, protocol_mux} object relationship strategy
* [ ] FEATURE: track whether a node was skipped for crawling and display as such in graphviz
* [ ] REFACTOR: move errors/warnings to a global config
* [ ] REFACTOR: do not block crawl() on lookup_name() in main crawl loop. will speed up many times
* [ ] REFACTOR: move mutex from provider_ssh to crawl.py
* [ ] BUG: intermittent timeouts crawling the whole tree - add retry to lookup_name/crawl_downstream?
* [ ] BUG: remove `blocking` from CrawlStrategy - it should only be in Protocol
* [ ] BUG: where is `elasticache-time-points`? crawl-netstat only takes 1 ip per port, so for async-soa which has 2 downstreams on 6379, it can't find
* [ ] BUG: where is `cx-dvb`??
* [ ] REFACTOR: consolidate Node::crawl_complete and crawl.py::_crawlable()
* [ ] BUG: required args showing as optional in --help

## Remder Ascii
* [ ] FEATURE: merge hints in ascii output

## Render Graphviz
* [ ] FEATURE: multiple seeds display with equal ranking
* [ ] FEATURE: nsq topics as nodes rather than edges
* [ ] FEATURE: visualize cycles
* [ ] FEATURE: different visualization for cache vs database
* [ ] FEATURE: create a legend

## Render JSON

## Render New
* [ ] DISPLAY: output in vizceral format
* [ ] DISPLAY: 'diff' run on multiple seed nodes and diff!

## CrawlStrategies
* [ ] BUG: HAProxy: functionality to detect bad HAProxy Config as a crawl error was lost in async refactor `if stdout.startswith('ERROR:'): return 'CRAWL ' + stdout.replace("\n","\t"), {}`
* [ ] BUG: NSQ: misconfigured clients have null server (this is why we don't see rattail -> relapse), investigate & resolve
* [ ] FEATURE: Netstat: use matchAddress for HAProxy crawl strategies to avoid timeout to RDS hostnames
* [ ] FEATURE: crawl downstream - ability to specify more providers args per provider (so that k8s can selectively crawl containers)


## Provider SSH
* [ ] FEATURE: revisit whether `occupy_one_sempahore_space` is working (to dynamically configure --concurrency)
* [ ] FEATURE: still getting ssh connections errors sometimes with out --concurrency=10
* [ ] FEATURE: configurable "~/.ssh/config" SSH profile
* [ ] REFACTOR (provider_ssh): we shouldn't use known_hosts=None for security reasons

## Provider AWS
* [ ] FEATURE: lookup_name is slow, use async
* [ ] CRAWL: dynamodb
* [ ] CRAWL: SQS

## Charlotte
* [ ] FEATURE (charlotte): yaml validation by schema

## Web

# Trash Can
* [ ] backwards compatability for haproxy w/out stats socket
* [ ] detect live traffic netstat/tcpdump/ebpf? (this was solved by using haproxy stats)
* [ ] remove crawl_strategy from Node()

0 comments on commit 5a2a263

Please sign in to comment.