assignment-3-modeling-attn

For this assignment, we are continuing to design modeling tasks to help you gain a deeper understanding of transformer's component modules.

Specifically, we will focus on one of the pivotal layers that form the backbone of the transformer structure: the attention layer.

Tasks

Task 1: Offline Sliding-Window Attention

Please read the description here.

Task 2: Online Sliding-Window Attention

Please read the description here.

Environment

You should have python 3.10+ installed on your machine.
(Optional) You had better have Nvidia GPU(s) with CUDA12.0+ installed on your machine, otherwise some features may not work properly (We will do our best to ensure that the difference in hardware does not affect your score.).
You are supposed to install all the necessary dependencies with the following command, which may vary a little among different assignments.
```
pip install -r requirements.txt
```
(Optional) You are strongly recommended to use a docker image from Nvidia Pytorch Release like 23.10 or some newer version as your basic environment in case of denpendency conflicts.