Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add videogen #6

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 56 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
# Flux Generator: macOS MLX-Powered Image & Music Generation with Open WebUI compatable API for Image generation
# Flux Generator: macOS MLX-Powered Image, Music & Video Generation with Open WebUI compatable API for Image generation

## Features

- Text-to-image generation
- Text-to-music generation (NEW!)
- Text-to-music generation
- Text-to-video generation (NEW!)
- Multiple model options:
- Image Generation:
- black-forest-labs Flux schnell/dev
- stabilityai sdxl-turbo/stable-diffusion-2-1
- Music Generation:
- facebook/musicgen-medium
- Video Generation:
- Wan-AI/Wan2.1-T2V-1.3B
- Customizable image size and generation parameters
- Advanced music generation controls
- Video generation with adjustable frames, resolution, and parameters
- Memory usage reporting
- API compatibility for Image generation for third-party UIs like Open WebUI
- Unified server for both UI and API
Expand All @@ -22,26 +26,37 @@
This repository utilizes the MLX framework, designed specifically for Apple Silicon, to provide optimized performance for:
- Black Forest Flux and Stable Diffusion image generation
- Facebook's MusicGen audio generation
- Wan-AI's text-to-video generation

MLX leverages the unified memory architecture of Apple's M-series chips, enabling faster and more efficient computations.

### Why MLX?

* **Performance:** Experience significant speed improvements compared to other frameworks on Apple Silicon.
* **Local Execution:** Run Stable Diffusion models locally on your Mac, ensuring data privacy and enabling offline use.
* **Local Execution:** Run models locally on your Mac, ensuring data privacy and enabling offline use.
* **Fine-Tuning:** MLX provides a great environment for fine-tuning models on apple silicon.
* **Memory Efficiency:** Optimized memory management for large models and video generation.

For more examples of what MLX can do, check out the official mlx-examples repository: [https://github.com/ml-explore/mlx-examples](https://github.com/ml-explore/mlx-examples)

This repository is designed to give apple silicon users a fast and easy way to generate images locally.
This repository is designed to give apple silicon users a fast and easy way to generate content locally.

## UI Screenshots:
![Flux image generation Tab in UI](flux_app_ui_imagegen.jpg)
![Flux music generation Tab in UI](flux_app_ui_musicgen.jpg)
![Flux video generation Tab in UI](flux_app_ui_videogen.jpg)

## System Requirements

## Example Generation
- macOS with Apple Silicon (M1/M2/M3)
- Python 3.10+ (tested with python3.11)
- MLX framework
- 32GB RAM recommended (especially for video generation)
- Additional libraries for audio and video processing

## Example Generations

### Image Generation
Here's an example image generated using the Flux model:

![Moonset over ocean](generated_moonset.png)
Expand All @@ -53,6 +68,19 @@ Parameters:
- Steps: 2
- CFG Scale: 4.0

### Video Generation
Parameters for optimal video generation:
- Number of frames: 16-32
- Resolution: 256x256 to 512x512
- Steps: 50 (recommended)
- Guidance Scale: 7.5 (adjustable)
- FPS: 8 (default)

Example prompts for video generation:
- "A blooming flower timelapse, highly detailed"
- "Waves crashing on a beach at sunset"
- "Space journey through colorful nebulas"

## Requirements

- macOS with Apple Silicon (M1/M2/M3)
Expand Down Expand Up @@ -179,9 +207,11 @@ Once the server is running (either via `run_flux.sh` or manually):
2. Choose your desired generation mode:
- 🖼️ Image Generation: Enter a prompt, select a model and click generate
- 🎵 Music Generation: Enter a music description and adjust parameters
- 🎥 Video Generation: Enter a video description and adjust parameters
3. On first use, models will be downloaded:
- Image models: approximately 30 GB
- MusicGen model: approximately 3.5 GB
- Video model: approximately 5.68 GB
4. Download progress will be visible in the terminal
5. Once downloaded, generation will begin

Expand All @@ -201,9 +231,19 @@ The music generation interface provides several parameters to control the output
- **Top K**: Controls diversity of the output (50-500)
- **Guidance Scale**: Controls how closely to follow the prompt (1.0-10.0)

### Video Generation Parameters

The video generation interface provides several parameters to control the output:

- **Number of frames**: Controls the length of the generated video (16-32)
- **Resolution**: Controls the resolution of the generated video (256x256 to 512x512)
- **Steps**: Controls the number of steps for video generation (50 recommended)
- **Guidance Scale**: Controls how closely to follow the prompt (7.5 adjustable)
- **FPS**: Controls the frames per second of the generated video (8 default)

## API Integration

The application provides an API that can be used with third-party UIs like Open WebUI.
The application provides an image generation API that can be used with third-party UIs like Open WebUI.
Check this tutorial for Open WebUI integration instructions:
[Tutorial](https://voipnuggets.com/2025/02/18/flux-generator-local-image-generation-on-apple-silicon-with-open-webui-integration-using-flux-llm/
)
Expand Down Expand Up @@ -345,13 +385,17 @@ The Flux server requires model files to be downloaded before use. You can downlo

# Download MusicGen model
huggingface-cli download facebook/musicgen-medium

# Download Video Generation model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B
```

3. Using the command-line interface:
Note: Each Flux model is approximately 24GB in size, the SD models are bigger. The download includes:
- Model weights (flux1-{model}.safetensors)
- Autoencoder (ae.safetensors)
- Text encoders and tokenizers
Note: Total Model files sizes:
- Each Flux model: ~32GB
- Each Stable Diffusion model: ~34GB
- MusicGen model: ~11GB
- Video model: ~22GB

# Download command for all models
```bash
Expand All @@ -360,6 +404,7 @@ The Flux server requires model files to be downloaded before use. You can downlo
huggingface-cli download stabilityai/stable-diffusion-2-1-base
huggingface-cli download stabilityai/sdxl-turbo
huggingface-cli download facebook/musicgen-medium
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B
```

Model Repos:
Expand All @@ -368,6 +413,7 @@ https://huggingface.co/black-forest-labs/FLUX.1-dev
https://huggingface.co/stabilityai/stable-diffusion-2-1-base
https://huggingface.co/stabilityai/sdxl-turbo
https://huggingface.co/facebook/musicgen-medium
https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B

Model files are stored in the HuggingFace cache directory (`~/.cache/huggingface/hub/`).

Expand Down
116 changes: 116 additions & 0 deletions build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
"""
PyInstaller build script for Flux Generator

Prerequisites:
> python3 -m venv venv
> source venv/bin/activate (or source venv/bin/activate.fish for fish shell)
> python3 -m pip install -r requirements.txt
> python3 -m pip install pyinstaller
> python3 build.py

Platform specific libraries that MIGHT be needed:
MacOS:
- brew install portaudio
"""

import os
import platform
import sys
import PyInstaller.__main__

def build(signing_key=None):
app_name = 'Flux\\ Generator'
compile(signing_key)

macos = platform.system() == 'Darwin'
if macos and signing_key:
# Codesign
os.system(
f'codesign --deep --force --verbose --sign "{signing_key}" dist/{app_name}.app --options runtime')

zip_name = zip()

if signing_key:
keychain_profile = signing_key.split('(')[0].strip()
# Notarize
os.system(f'xcrun notarytool submit --wait --keychain-profile "{keychain_profile}" --verbose dist/{zip_name}')
input(f'Check whether notarization was successful using \n\t xcrun notarytool history --keychain-profile {keychain_profile}.\nYou can check debug logs using \n\t xcrun notarytool log --keychain-profile "{keychain_profile}" <run-id>')

# Staple
os.system(f'xcrun stapler staple dist/{app_name}.app')

# Zip the signed, stapled file
zip_name = zip()

def compile(signing_key=None):
# Path to main application script
app_script = 'flux_app.py'

# Common PyInstaller options
pyinstaller_options = [
'--clean',
'--noconfirm',

# --- Basics --- #
'--name=Flux Generator',
'--icon=resources/icon.png', # You'll need to create this
'--windowed', # GUI application

# Where to find necessary packages
'--paths=./venv/lib/python3.11/site-packages',

# Required imports
'--hidden-import=mlx',
'--hidden-import=gradio',
'--hidden-import=fastapi',
'--hidden-import=transformers',
'--hidden-import=huggingface_hub',
'--hidden-import=soundfile',
'--hidden-import=scipy',
'--hidden-import=tqdm',

# Static files and resources
'--add-data=musicgen:musicgen',
'--add-data=resources/*:resources',

# Main script
app_script
]

# Platform-specific options
if platform.system() == 'Darwin': # MacOS
if signing_key:
pyinstaller_options.extend([
f'--codesign-identity={signing_key}'
])

# Run PyInstaller
PyInstaller.__main__.run(pyinstaller_options)
print('Done. Check dist/ for executables.')

def zip():
# Zip the app
print('Zipping the executables')
app_name = 'Flux\\ Generator'
zip_name = 'Flux-Generator'

if platform.system() == 'Darwin': # MacOS
if platform.processor() == 'arm':
zip_name = zip_name + '-MacOS-M-Series' + '.zip'
else:
zip_name = zip_name + '-MacOS-Intel' + '.zip'
# Special zip command for macos to keep the complex directory metadata intact
zip_cli_command = 'cd dist/; ditto -c -k --sequesterRsrc --keepParent ' + app_name + '.app ' + zip_name

os.system(zip_cli_command)
return zip_name

if __name__ == '__main__':
apple_code_signing_key = None
if len(sys.argv) > 1:
apple_code_signing_key = sys.argv[1] # python3 build.py "Developer ID Application: ... (...)"
print("apple_code_signing_key: ", apple_code_signing_key)
elif len(sys.argv) == 1 and platform.system() == 'Darwin':
input("Are you sure you don't want to sign your code? ")

build(apple_code_signing_key)
25 changes: 25 additions & 0 deletions create_icns.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

# Create iconset directory
mkdir -p resources/icon.iconset

# Copy PNG files to iconset with correct names
cp resources/icon_16x16.png resources/icon.iconset/icon_16x16.png
cp resources/icon_32x32.png resources/icon.iconset/icon_32x32.png
cp resources/icon_32x32.png resources/icon.iconset/icon_16x16@2x.png
cp resources/icon_64x64.png resources/icon.iconset/icon_32x32@2x.png
cp resources/icon_128x128.png resources/icon.iconset/icon_128x128.png
cp resources/icon_256x256.png resources/icon.iconset/icon_256x256.png
cp resources/icon_256x256.png resources/icon.iconset/icon_128x128@2x.png
cp resources/icon_512x512.png resources/icon.iconset/icon_512x512.png
cp resources/icon_512x512.png resources/icon.iconset/icon_256x256@2x.png
cp resources/icon_1024x1024.png resources/icon.iconset/icon_512x512@2x.png

# Create icns file
iconutil -c icns resources/icon.iconset

# Clean up temporary files
rm -rf resources/icon.iconset
rm resources/icon_*.png

echo "Created resources/icon.icns"
60 changes: 60 additions & 0 deletions create_icon.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
from PIL import Image, ImageDraw
import os

def create_icon(size=1024):
# Create a new image with a white background
icon = Image.new('RGBA', (size, size), (255, 255, 255, 0))
draw = ImageDraw.Draw(icon)

# Colors
primary_color = (64, 156, 255) # Blue for water/flux
accent_color = (255, 128, 64) # Orange for music/energy

# Calculate dimensions
padding = size // 8
center = size // 2
radius = (size - 2 * padding) // 2

# Draw the main circular gradient (representing flux/flow)
for r in range(radius, 0, -1):
alpha = int(255 * (r / radius))
color = (*primary_color, alpha)
draw.ellipse(
[center - r, center - r, center + r, center + r],
fill=color
)

# Draw the music wave pattern
wave_height = size // 4
wave_width = size // 16
wave_y = center + radius // 2

# Draw three wave bars with varying heights
heights = [0.8, 1.0, 0.6] # Relative heights for visual interest
for i, height in enumerate(heights):
x = center + (i - 1) * (wave_width * 2)
h = int(wave_height * height)
draw.rounded_rectangle(
[x - wave_width//2, wave_y - h//2,
x + wave_width//2, wave_y + h//2],
radius=wave_width//4,
fill=(*accent_color, 200)
)

# Save in different sizes for macOS
if not os.path.exists('resources'):
os.makedirs('resources')

# Save the main icon
icon.save('resources/icon.png')

# Create .icns compatible sizes
sizes = [16, 32, 64, 128, 256, 512, 1024]
for s in sizes:
resized = icon.resize((s, s), Image.Resampling.LANCZOS)
resized.save(f'resources/icon_{s}x{s}.png')

print("Icon files generated in resources/ directory")

if __name__ == '__main__':
create_icon()
Loading