Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
ldanilek committed Nov 6, 2024
1 parent f32eec8 commit 473e136
Show file tree
Hide file tree
Showing 5 changed files with 156 additions and 112 deletions.
224 changes: 128 additions & 96 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,36 @@ Suppose you have a leaderboard of game scores. These are some operations
that the Aggregate component makes easy and efficient:

1. Count the total number of scores: `aggregate.count(ctx)`
2. Count the number of scores greater than 65: `aggregate.count(ctx, { lower: { key: 65, inclusive: false } })`
2. Count the number of scores greater than 65: `aggregate.count(ctx, { bounds: { lower: { key: 65, inclusive: false } } })`
3. Find the p95 score: `aggregate.at(ctx, Math.floor(aggregate.count(ctx) * 0.95))`
4. Find the overall average score: `aggregate.sum(ctx) / aggregate.count(ctx)`
5. Find the ranking for a score of 65 in the leaderboard: `aggregate.indexOf(ctx, 65)`
6. Find the average score for an individual user. You can define another aggregate
partitioned by user and aggregate within each:
grouped by user and aggregate within each:

```ts
// aggregateScoreByUser is the leaderboard scores partitioned by username.
// aggregateScoreByUser is the leaderboard scores grouped by username.
const bounds = { prefix: [username] };
const highScoreForUser = aggregateScoreByUser.max(ctx, bounds);
const highScoreForUser = await aggregateScoreByUser.max(ctx, { bounds });
const avgScoreForUser =
aggregateScoreByUser.sum(ctx, bounds) /
aggregateScoreByUser.count(ctx, bounds);
await aggregateScoreByUser.sum(ctx, { bounds }) /
await aggregateScoreByUser.count(ctx, { bounds });
// It still enables adding or averaging all scores across all usernames.
const globalAverageScore = await aggregateScoreByUser.sum(ctx) /
await aggregateScoreByUser.count(ctx);
```

7. Alternatively, you can define a third aggregate with separate namespaces,
and do the same query. This method increases throughput because a user's data
won't interfere with other users. However, you lose the ability to aggregate
over all users.

```ts
const forUser = { namespace: username };
const highScoreForUser = await aggregateScoreByUser.max(ctx, forUser);
const avgScoreForUser =
await aggregateScoreByUser.sum(ctx, { bounds }) /
await aggregateScoreByUser.count(ctx, { bounds });
```

The Aggregate component provides `O(log(n))`-time lookups, instead of the `O(n)`
Expand All @@ -51,10 +67,9 @@ The keys may be arbitrary Convex values, so you can choose to sort your data by:
4. Nothing, use `key=null` for everything if you just want
[a total count, such as for random access](#total-count-and-randomization).

### Partitioning
### Grouping

You can use sorting to partition your data set, enabling namspacing,
multitenancy, sharding, and more.
You can use sorting to group your data set.

If you want to keep track of multiple games with scores for each user,
use a tuple of `[game, username, score]` as the key.
Expand All @@ -76,6 +91,52 @@ would need to aggregate with key `[game, score]`.
To support different sorting and partitioning keys, you can define multiple
instances. See [below](#defining-multiple-aggregates) for details.

If you separate your data via the `sortKey` and `prefix` bounds, you can look at
your data from any altitude. You can do a global `count` to see how many total
data points there are, or you can zero in on an individual group of the data.

However, there's a tradeoff: nearby data points can interfere with each other
in the internal data structure, reducing throughput. See
[below](#read-dependencies-and-writes) for more details. To avoid interference,
you can use Namespaces.

### Namespacing

If your data is separated into distinct partitions, and you don't need to
aggregate between partitions, then you can put each partition into its own
namespace. Each namespace gets its own internal data structure.

If your app has multiple games, it's not useful to aggregate scores across
different games. The scoring system for chess isn't related to the scoring
system for football. So we can namespace our scores based on the game.

Whenever we aggregate scores, we *must* specify the namespace.
On the other hand, the internal aggregation data structure can keep the scores
separate and keep throughput high.

Here's how you would create the aggregate we just described:

```ts
const leaderboardByGame = new TableAggregate<{
namespace: Id<"games">,
key: number,
dataModel: DataModel,
tableName: "scores",
}>(components.leaderboardByGame, {
namespace: (doc) => doc.gameId,
sortKey: (doc) => doc.score,
});
```

And whenever you use this aggregate, you specify the namespace.

```ts
const footballHighScore = await leaderboardByGame.max(ctx, { namespace: footballId });
```

See an example of a namespaced aggregate in
[example/convex/photos.ts](./example/convex/photos.ts).

### More examples

The Aggregate component can efficiently calculate all of these:
Expand Down Expand Up @@ -149,9 +210,15 @@ import { DataModel } from "./_generated/dataModel";
import { mutation as rawMutation } from "./_generated/server";
import { TableAggregate } from "@convex-dev/aggregate";

const aggregate = new TableAggregate<number, DataModel, "mytable">(
const aggregate = new TableAggregate<{
namespace: undefined,
key: number,
dataModel: DataModel,
tableName: "mytable",
}>(
components.aggregate,
{
namespace: (doc) => undefined, // disable namespacing.
sortKey: (doc) => doc._creationTime, // Allows querying across time ranges.
sumValue: (doc) => doc.value, // The value to be used in `.sum` calculations.
}
Expand All @@ -167,12 +234,14 @@ here's how you might define `aggregateByGame`, as an aggregate on the "scores"
table:

```ts
const aggregateByGame = new TableAggregate<
[Id<"games">, string, number],
DataModel,
"leaderboard"
>(components.aggregateByGame, {
sortKey: (doc) => [doc.gameId, doc.username, doc.score],
const aggregateByGame = new TableAggregate<{
namespace: Id<"games">,
key: [string, number],
dataModel: DataModel,
tableName: "leaderboard"
}>(components.aggregateByGame, {
namespace: (doc) => doc.gameId,
sortKey: (doc) => [doc.username, doc.score],
});
```

Expand Down Expand Up @@ -234,75 +303,27 @@ To run the examples:
4. The dashboard should open and you can run functions like
`leaderboard:addScore` and `leaderboard:userAverageScore`.

### Namespaces

When you have independent data sets, use `namespaces` for greater throughput.
A namespace is a segment of your data points, like all users within a team,
or all metrics related to a user.
It behaves similarly to using a prefix on `sortKey`, but more efficiently.
By dividing your data into namespaces, you can more read data more efficiently,
since your queries will never be invalidated due to writes in other namespaces.
Writes between namespaces will never conflict, reducing chances of write contention
resulting in slowdowns and OCC failure.
The limitation is that you cannot calculate aggregates across namespaces.
If you need to aggregate across top-level segments, use `sortKey` with a prefix.

For example, suppose you have a bunch of leaderboard scores for several games,
and the scores for each game are independent. You can use the game id as a
namespace. Then each game gets its own data structure in the aggregate
component, preventing reads and writes for different games to conflict with each other.

```ts
const aggregateByGame = new NamespacedTableAggregate<
[string, number],
DataModel,
"leaderboard",
Id<"games">
>(components.aggregateByGame, {
sortKey: (doc) => [doc.username, doc.score],
namespace: (doc) => doc.gameId,
});
```

Now when you need to aggregate within a game, you call `.get` to narrow down the
computation to a single game.

```ts
const countTimesGamePlayed = await aggregateByGame.get(gameId).count();
```

There are namespaced classes for each kind of Aggregate you may want to build:
`NamespacedTableAggregate`, `NamespacedRandomize`, and
`NamespacedDirectAggregate`.

### Total Count and Randomization

If you don't need the ordering, partitioning, or summing behavior of
`TableAggregate`, there's a simpler interface you can use: `Randomize`.
`TableAggregate`, you can set `namespace: undefined` and `sortKey: null`.

```ts
import { components } from "./_generated/api";
import { DataModel } from "./_generated/dataModel";
import { mutation as rawMutation } from "./_generated/server";
import { Randomize } from "@convex-dev/aggregate";
import { customMutation } from "convex-helpers/server/customFunctions";
// This is like TableAggregate but there's no key or sumValue.
const randomize = new Randomize<DataModel, "mytable">(components.aggregate);

// In a mutation, insert into the component when you insert into your table.
const id = await ctx.db.insert("mytable", data);
await randomize.insert(ctx, id);

// As before, delete from the component when you delete from your table
await ctx.db.delete(id);
await randomize.delete(ctx, id);

// in a query, get the total document count.
const totalCount = await randomize.count(ctx);
// get a random document's id.
const randomId = await randomize.random(ctx);
const randomize = new TableAggregate<{
namespace: undefined,
key: null,
dataModel: DataModel,
tableName: "mytable",
}>(components.aggregate, {
namespace: (doc) => undefined,
sortKey: (doc) => null,
});
```

Without sorting, all documents are ordered by their `_id` which is generally
random. And you can look up the document at any index to find one at random
or shuffle the whole table.

See more examples in [`example/convex/shuffle.ts`](example/convex/shuffle.ts),
including a paginated random shuffle of some music.

Expand All @@ -313,27 +334,36 @@ Convex supports infinite-scroll pagination which is
to worry about items going missing from your list. But sometimes you want to
display separate pages of results on separate pages of your app.

For this example, imagine you have a table of photos
For this example, imagine you have a table of photo albums.

```ts
// convex/schema.ts
defineSchema({
photos: defineTable({
album: v.string(),
url: v.string(),
}),
}).index("by_album_creation_time", ["album"]),
});
```

And an aggregate defined with key as `_creationTime`.
And an aggregate defined with key as `_creationTime` and namespace as `album`.

```ts
// convex/convex.config.ts
app.use(aggregate, { name: "photos" });

// convex/photos.ts
const photos = new TableAggregate<number, DataModel, "photos">(
const photos = new TableAggregate<{
namespace: string, // album name
key: number, // creation time
dataModel: DataModel,
tableName: "photos",
}>(
components.photos,
{ sortKey: (doc) => doc._creationTime }
{
namespace: (doc) => doc.album,
sortKey: (doc) => doc._creationTime,
}
);
```

Expand All @@ -342,15 +372,15 @@ map from offset to an index key.

In this example, if `offset` is 100 and `numItems` is 10, we get the hundredth
`_creationTime` (in ascending order) and starting there we get the next ten
documents.
documents. In this way we can paginate through the whole photo album.

```ts
export const pageOfPhotos({
args: { offset: v.number(), numItems: v.number() },
handler: async (ctx, { offset, numItems }) => {
const { key } = await photos.at(ctx, offset);
args: { offset: v.number(), numItems: v.number(), album: v.string() },
handler: async (ctx, { offset, numItems, album }) => {
const { key } = await photos.at(ctx, offset, { namespace: album });
return await ctx.db.query("photos")
.withIndex("by_creation_time", q=>q.gte("_creationTime", key))
.withIndex("by_album_creation_time", q=>q.eq("album", album).gte("_creationTime", key))
.take(numItems);
},
});
Expand All @@ -369,19 +399,21 @@ insert, delete, and replace operations yourself.
import { components } from "./_generated/api";
import { DataModel } from "./_generated/dataModel";
import { DirectAggregate } from "@convex-dev/aggregate";
// The first generic parameter (number in this case) is the key.
// The second generic parameter (string in this case) should be unique to
// be a tie-breaker in case two data points have the same key.
const aggregate = new DirectAggregate<number, string>(components.aggregate);
// Note the `id` should be unique to be a tie-breaker in case two data points
// have the same key.
const aggregate = new DirectAggregate<{
key: number,
id: string,
}>(components.aggregate);

// within a mutation, add values to be aggregated
await aggregate.insert(ctx, key, id);
await aggregate.insert(ctx, { key, id });
// if you want to use `.sum` to aggregate sums of values, insert with a sumValue
await aggregate.insert(ctx, key, id, sumValue);
await aggregate.insert(ctx, { key, id, sumValue });
// or delete values that were previously added
await aggregate.delete(ctx, key, id);
await aggregate.delete(ctx, { key, id });
// or update values
await aggregate.replace(ctx, oldKey, newKey, id);
await aggregate.replace(ctx, { key: oldKey, id }, { key: newKey });
```

See [`example/convex/stats.ts`](example/convex/stats.ts) for an example.
Expand Down
4 changes: 2 additions & 2 deletions example/convex/photos.ts
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,8 @@ export const pageOfPhotos = query({
const { key: firstPhotoCreationTime } = await photos.at(ctx, offset, { namespace: album });
const photoDocs = await ctx.db
.query("photos")
.withIndex("by_creation_time", (q) =>
q.gte("_creationTime", firstPhotoCreationTime)
.withIndex("by_album_creation_time", (q) =>
q.eq("album", album).gte("_creationTime", firstPhotoCreationTime)
)
.take(numItems);
return photoDocs.map((doc) => doc.url);
Expand Down
2 changes: 1 addition & 1 deletion example/convex/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ export default defineSchema({
photos: defineTable({
album: v.string(),
url: v.string(),
}),
}).index("by_album_creation_time", ["album"]),
});
6 changes: 5 additions & 1 deletion example/convex/stats.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@ import { v } from "convex/values";
import { DirectAggregate } from "@convex-dev/aggregate";
import { components } from "./_generated/api";

const stats = new DirectAggregate<number, string>(components.stats);
const stats = new DirectAggregate<{
namespace: undefined,
key: number,
id: string,
}>(components.stats);

export const reportLatency = mutation({
args: {
Expand Down
Loading

0 comments on commit 473e136

Please sign in to comment.