Skip to content

Commit

Permalink
update main readme + moe readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Vicba committed Sep 3, 2024
1 parent 5354f74 commit cfc69e3
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 3 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,12 @@ To get started with the code in this repository, follow these steps:

Here's a list of the white papers and their corresponding implementations available in this repository:

- **["Attention is all you need" paper](https://arxiv.org/abs/1706.03762)**: Introduces the Transformer model, which uses self-attention to focus on important words in a sentence, making it faster and better at understanding long sentences compared to older models.
- **["An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"](https://arxiv.org/abs/2010.11929)**: Visual transformer (ViT) is a type of neural network that uses self-attention to process images for classification tasks.

- **["Attention is All You Need" paper](https://arxiv.org/abs/1706.03762)**: Introduces the Transformer model, which uses self-attention to focus on important words in a sentence, making it faster and better at understanding long sentences compared to older models.
- **["An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"](https://arxiv.org/abs/2010.11929)**: Presents the Visual Transformer (ViT), a neural network that applies self-attention to image classification tasks.
- **Mixture of Experts (various papers)**: Describes a neural network architecture that uses multiple specialized models (experts) and a gating mechanism to improve performance and adaptability by activating only the most relevant experts for a given task.
Each implementation is located in its own directory, which includes:

Each one contains:
- **`README.md`**: Detailed instructions on how to use the code for that specific paper.
<!-- - **`main.py`**: The main script to run the implementation. -->
- **`requirements.txt`**: Python dependencies required for that implementation.
Expand Down
14 changes: 14 additions & 0 deletions moe/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Mixture Of Experts (MoE)

![MoE](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/moe/00_switch_transformer.png)

Enhances performance by utilizing a set of specialized models, or "experts," for different tasks. This framework allows only a subset of experts to be activated for any given input, optimizing computational resources and improving efficiency.

> Note: my code is a simplified version, as it does not add/handle noise,randomness, etc.
Resources:
- https://huggingface.co/blog/moe
- https://cameronrwolfe.substack.com/p/conditional-computation-the-birth
- https://github.com/lucidrains/mixture-of-experts
- https://www.youtube.com/watch?v=0U_65fLoTq0

0 comments on commit cfc69e3

Please sign in to comment.