v0.1.2
- Support windows native. #4
- Support multiple GPU. #8
- Support llamfile as linear backend.
- Support new model: mixtral 8 * 7B and 8 * 22B
- Support q2k, q3k, q5k dequant on gpu. #16
- Support github action to create pre compile package
- Support shared memory in different operator
- Fix some bugs on build from source #23