Skip to content

Commit

Permalink
init: initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
thannaske committed Feb 28, 2025
0 parents commit 8b544fe
Show file tree
Hide file tree
Showing 13 changed files with 1,309 additions and 0 deletions.
74 changes: 74 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: Build and Release

on:
push:
branches: [ main ]
tags: [ 'v*' ]
pull_request:
branches: [ main ]

jobs:
build:
name: Build
runs-on: ubuntu-latest
strategy:
matrix:
goos: [linux, darwin, windows]
goarch: [amd64, arm64]
exclude:
# Exclude Windows on ARM64 as it's less common
- goos: windows
goarch: arm64
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.21'
check-latest: true

- name: Build binary
env:
GOOS: ${{ matrix.goos }}
GOARCH: ${{ matrix.goarch }}
run: |
# Set the output binary name with extension based on OS
EXT=""
if [ "${{ matrix.goos }}" = "windows" ]; then
EXT=".exe"
fi
# Build the binary
go build -v -o "s3usage-${{ matrix.goos }}-${{ matrix.goarch }}${EXT}" ./cmd/s3usage
- name: Upload build artifact
uses: actions/upload-artifact@v3
with:
name: s3usage-${{ matrix.goos }}-${{ matrix.goarch }}
path: s3usage-${{ matrix.goos }}-${{ matrix.goarch }}*
if-no-files-found: error

release:
name: Create Release
needs: build
if: startsWith(github.ref, 'refs/tags/')
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Download artifacts
uses: actions/download-artifact@v3
with:
path: ./artifacts

- name: Create release
id: create_release
uses: softprops/action-gh-release@v1
with:
files: ./artifacts/**/*
draft: false
prerelease: false
generate_release_notes: true
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
build/
146 changes: 146 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# S3Usage - S3 Bucket Usage Monitor for Ceph

S3Usage is a command-line application that monitors storage usage of S3 buckets in a Ceph cluster. It uses Ceph's RGW Admin Ops API to efficiently retrieve bucket statistics, stores the data in a SQLite database, calculates monthly averages, and provides commands to query historical usage information.

## Features

- Connect to Ceph S3 RGW Admin API to efficiently retrieve bucket usage statistics
- Store usage data in a SQLite database
- Calculate monthly average bucket usage
- Display monthly usage for all buckets
- Query historical usage for specific buckets
- Prune old data points while preserving monthly statistics

## Implementation Details

This tool uses two different APIs:
- Standard S3 API to list the buckets (as a fallback)
- Ceph RGW Admin Ops API to fetch bucket statistics efficiently

The Admin API requests are properly signed using AWS Signature v4 authentication, which is required by Ceph RGW. Using the Admin Ops API provides significant performance improvements over listing all objects in buckets, making this tool suitable for monitoring Ceph S3 deployments with many large buckets.

## Authentication Requirements

The tool uses proper AWS Signature v4 authentication for Admin API requests, identical to the authentication used by the `radosgw-admin` CLI tool. This requires:

1. A Ceph user with administrative privileges
2. The access and secret keys for that user
3. The correct Ceph RGW endpoint URL

## Installation

### From Source

```bash
git clone https://github.com/thannaske/s3usage.git
cd s3usage
go build -o s3usage ./cmd/s3usage
# Optionally, move to a directory in your PATH
sudo mv s3usage /usr/local/bin/
```

## Usage

### Environment Variables

You can configure S3Usage using environment variables:

- `S3_ENDPOINT`: Ceph S3 endpoint URL (should be the RGW API endpoint)
- `S3_ACCESS_KEY`: S3 access key (requires admin privileges for RGW Admin API)
- `S3_SECRET_KEY`: S3 secret key
- `S3_REGION`: S3 region (default: "default")
- `S3_DB_PATH`: Path to SQLite database (default: `~/.s3usage.db`)

### Required Permissions

The access key used must have administrative privileges on the Ceph RGW to access the Admin API endpoints. You can create a user with the appropriate permissions using:

```bash
radosgw-admin user create --uid=s3usage --display-name="S3 Usage Monitor" --caps="buckets=*;users=*;usage=*;metadata=*;zone=*"
radosgw-admin key create --uid=s3usage --key-type=s3 --gen-access-key --gen-secret
```

### Collecting Usage Data

To collect bucket usage data and store it in the database, use the `collect` command:

```bash
s3usage collect --endpoint=https://s3.example.com --access-key=YOUR_ACCESS_KEY --secret-key=YOUR_SECRET_KEY
```

Or using environment variables:

```bash
export S3_ENDPOINT=https://s3.example.com
export S3_ACCESS_KEY=YOUR_ACCESS_KEY
export S3_SECRET_KEY=YOUR_SECRET_KEY
s3usage collect
```

This command is meant to be scheduled via cron to collect data regularly.

### Monthly Usage Report

To display the monthly average usage for all buckets:

```bash
s3usage list --year=2025 --month=2
```

If no year/month is specified, the previous month's data is shown.

### Bucket Usage History

To view historical usage data for a specific bucket:

```bash
s3usage history my-bucket-name
```

This shows a year's worth of historical data for the specified bucket.

### Pruning Old Data

To clean up individual data points from months that have already been aggregated into monthly averages:

```bash
s3usage prune
```

This will prompt for confirmation before deleting data. To skip the confirmation:

```bash
s3usage prune --confirm
```

The prune command only removes data points from completed months that already have monthly averages calculated. It preserves:
- All monthly average statistics
- Data points from the current month
- Data points from months without calculated averages

This helps keep the database size manageable over time without losing valuable statistics.

## Cron Setup

To collect data daily, add a cron job:

```bash
# Edit crontab
crontab -e

# Add this line to run daily at 23:45
45 23 * * * /usr/local/bin/s3usage collect --endpoint=https://s3.example.com --access-key=YOUR_ACCESS_KEY --secret-key=YOUR_SECRET_KEY
```

## Troubleshooting

If you encounter authentication issues:

1. Verify your user has the correct admin capabilities
2. Ensure your endpoint URL is correct (it should point to the RGW API endpoint)
3. Check that you're using the correct access and secret keys
4. Verify the region setting matches your Ceph configuration

## License

MIT
90 changes: 90 additions & 0 deletions cmd/s3usage/collect.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
package main

import (
"context"
"fmt"
"time"

"github.com/spf13/cobra"
"github.com/thannaske/s3usage/pkg/ceph"
"github.com/thannaske/s3usage/pkg/db"
)

var collectCmd = &cobra.Command{
Use: "collect",
Short: "Collect bucket usage data",
Long: `Collect usage data for all buckets and store it in the database.`,
Run: func(cmd *cobra.Command, args []string) {
// Validate required parameters
if config.S3Endpoint == "" || config.S3AccessKey == "" || config.S3SecretKey == "" {
fmt.Println("Error: Missing required S3 credentials. Please provide --endpoint, --access-key, and --secret-key.")
return
}

// Initialize the database
database, err := db.NewDB(config.DBPath)
if err != nil {
fmt.Printf("Error connecting to database: %v\n", err)
return
}
defer database.Close()

err = database.InitDB()
if err != nil {
fmt.Printf("Error initializing database: %v\n", err)
return
}

// Initialize the S3 client
s3Client, err := ceph.NewS3Client(config)
if err != nil {
fmt.Printf("Error initializing S3 client: %v\n", err)
return
}

// Get usage data for all buckets
fmt.Println("Collecting bucket usage data...")
usages, err := s3Client.GetAllBucketsUsage(context.Background())
if err != nil {
fmt.Printf("Error collecting bucket usage data: %v\n", err)
return
}

// Store usage data in the database
for _, usage := range usages {
err = database.StoreBucketUsage(usage)
if err != nil {
fmt.Printf("Error storing usage data for bucket %s: %v\n", usage.BucketName, err)
continue
}
fmt.Printf("Stored usage data for bucket %s: %d bytes, %d objects\n",
usage.BucketName, usage.SizeBytes, usage.ObjectCount)
}

// Check if we need to calculate monthly averages
now := time.Now()
// If it's the end of the month (last day), calculate monthly averages
if now.Day() == getDaysInMonth(now.Year(), int(now.Month())) {
fmt.Println("Calculating monthly averages...")
err = database.CalculateMonthlyAverages(now.Year(), int(now.Month()))
if err != nil {
fmt.Printf("Error calculating monthly averages: %v\n", err)
return
}
fmt.Println("Monthly averages calculated successfully.")
}

fmt.Println("Collection completed successfully.")
},
}

// getDaysInMonth returns the number of days in a month
func getDaysInMonth(year, month int) int {
// Create a date in the month and go to the 0th day of the next month
t := time.Date(year, time.Month(month+1), 0, 0, 0, 0, 0, time.UTC)
return t.Day()
}

func init() {
rootCmd.AddCommand(collectCmd)
}
Loading

0 comments on commit 8b544fe

Please sign in to comment.