Model Studio

Industrial-Grade
Model Training.

Train, evaluate, and deploy high-performance ML models on your own infrastructure with a seamless workflow.

Training Pipeline

Data Staging

Ingest and version raw data with MinIO + Parquet chunking.

Chunking

Automatically partition data into optimized training batches.

Training

Run distributed training with live loss monitoring.

Registry

Promote models to a versioned, production-ready registry.

170+

Models Created

218

Training Runs

27

Deployed

88.2%

Avg Accuracy

Two Ways to Build Your Model

Choose the path that fits your team's technical expertise.

No Code Required

Guided ML

Perfect for business analysts and domain experts. Build high-accuracy models using our automated machine learning (AutoML) pipeline. No background in data science or programming needed.

1

Model Type

Choose from pre-configured business templates

2

Dataset

Point-and-click data selection from any source

3

Configure

Simple target selection and feature toggling

4

Review & Deploy

One-click deployment to production endpoints

Bring Your Code

Custom Model

For data scientists who want full control. Upload your own Python code or link a Git repository. RiverGen handles the infrastructure, scaling, and versioning for you.

1

Model Architecture

Support for PyTorch, TensorFlow, Scikit-Learn

2

Git Integration

Directly pull and sync from GitHub or GitLab

3

Hyperparameters

Fine-tune with custom training arguments

4

Custom Configure

Define entry points and environment setups

5

Review & Deploy

Scale across GPU-accelerated clusters

From Data to Deployed Model in 4 Steps

The intuitive interface that makes machine learning accessible.

1
2
3
4

Choose Your Foundation

RiverGen provides pre-trained architectures optimized for common business problems. Each template comes with built-in best practices for data processing and evaluation.

High Performance

Classification

Predict categories or binary outcomes from structured data.

Common Use Cases

Customer churn risk
Lead qualification
Fraud detection
High Performance

Regression

Predict precise numerical values and ranges.

Common Use Cases

LTV estimation
House price valuation
Supply chain lead times

Time Series

Identify patterns and forecast future temporal values.

Common Use Cases

Weekly revenue goals
Energy consumption
Inventory demand

Clustering

Discover hidden segments and patterns in unlabeled data.

Common Use Cases

Customer personas
Topic modeling
Network anomalies

Custom Model — Bring Your Own Code

Total flexibility for research teams and ML engineers. Integrate your existing codebases, research notebooks, and custom training logic into our industrial-grade orchestration layer.

Select Custom Model Template

Unlock code-level controls by choosing the custom container template. Perfect for PyTorch, TensorFlow, or JAX research projects that require unique environment configurations.

Code Integration & Git Sync

Directly sync from private GitHub or GitLab repositories. We support multi-file projects and automatically detect your dependencies from requirements.txt, conda.yml, or pyproject.toml.

Drag source or .zip here

Max size 100MB · Automatic conda.yml detection

Secure Data Reference

reference your Data Sources directly in code. Model Studio handles secure ingestion, credential management, and high-speed delivery to your training nodes.

Advanced Configuration

Define custom hyperparameters, entry points, and CLI arguments. Our orchestrator injects these directly into your training environment.

Hyperparameters

Epochs: 100
Learning Rate: 2e-5
Weight Decay: 0.01
Batch Size: 32

Entry Script

python -m research.train --cfg=conf/dist.yaml

High-Compute Provisioning

Select from standard or high-memory GPU instances. Launch distributed training runs across NVIDIA Tesla T4 or A100 clusters with one click.

research_v2/dist_train.py
from rivergen import ModelStudio
import torch
import torch.nn as nn

class TransformerModel(nn.Module):
    def __init__(self, d_model=128):
        super().__init__()
        self.encoder = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(d_model, nhead=8), 
            num_layers=6
        )
        self.fc = nn.Linear(d_model, 1)

    def forward(self, x):
        return self.fc(self.encoder(x))

# Establish secure connection
studio = ModelStudio.connect()

# Launch distributed training
studio.train(
    model=TransformerModel(),
    data="financial_ops_prod",
    config="configs/prod_v2.yaml",
    gpu=True,
    nodes=4
)

Runtime Compatibility

Python 3.9+PyTorch 2.0TensorFlowCUDA 11.8

How Your Data Gets There

Secure, automated data ingestion — no pipeline code required. We bridge the gap between your raw operational data and high-performance training environments.

Source Systems

Code, NoCode, or APIs

MSA Ingestion

Validation & Schema Sync

MinIO Storage

Immutable Parquet Chunks

Training Compute

Distributed GPU Clusters

Automated MSA Batching

The Model Studio Agent (MSA) automatically handles the heavy lifting of data preparation. It partitions massive datasets into optimized Parquet format, ensuring sub-millisecond retrieval during training epochs while maintaining strict schema consistency across every run.

Auto-Partitioning
Zero-Copy Ingest

Secure Immutable Staging

Every byte of your data is staged in a private, encrypted MinIO environment. We use immutable snapshots to ensure that model results are 100% reproducible. Your raw data never leaves your infrastructure perimeter, maintaining total compliance and security.

End-to-End Encryption
SOC2 Compliant

Train on Your Schedule

Model Studio doesn't just train once. It becomes a living part of your data stack, ensuring your intelligence layer evolves as fast as your business.

Scheduled Retraining

Maintain model freshness by setting cron-based schedules. Automatically retrain nightly or weekly to capture the latest market shifts and user behaviors without manual intervention.

Event-Triggered Jobs

Trigger training runs based on data thresholds or system events. If your dataset grows by 10% or a new batch of labeled data arrives, Model Studio starts work immediately.

Intelligent Monitoring

Stay informed with context-rich alerts. Get notified via Slack, Email, or Webhooks about convergence success, accuracy improvements, or infrastructure health.

Centralized Model Registry

Your organization's institutional knowledge, versioned and protected. Every model is a documented asset ready for deployment.

Zero-Downtime Hot-Swapping

Deploy new versions to production APIs without dropping a single request.

Immutable Model Snapshots

Every version is permanently archived with its weights, code, and data links.

Native A/B Testing

Route traffic between versions to validate real-world performance improvements.

customer_churn

v3.2.0Trained on 1.2M rows
Production

Architecture

XGBoost

Metric (Acc)

94.2%

revenue_forecast

v1.5.1Resource timeout
Retrying

Architecture

LSTM

Metric (Acc)

72.0%

fraud_detect

v4.0.0Candidate for prod
Staging

Architecture

Transformer

Metric (Acc)

98.8%

inventory_opt

v2.1.0Configuring features
Draft

Architecture

Prophet

Metric (Acc)

--

Showing 4 of 128 registered models

What is Model Studio?

The brains powering every prompt and answer in River Gen.

Model Studio manages which intelligence settings are active, how they are being used, and where they are connected in your workflows. While the underlying technology can be complex, Model Studio keeps this complexity hidden from everyday users and surfaces only simple, understandable controls.

For Non-Technical Teams

Roll out improvements once. Every team benefits automatically.

When you run a prompt, it uses an approved and well-tuned intelligence layer. Changes can be rolled out centrally so prompts across the organization benefit from improvements — without each user having to change anything. Over time, Model Studio keeps answers accurate, consistent, and aligned with business needs.

Live Training Monitor

Eliminate the "black box" of AI training. RiverGen provides deep visibility into your model's learning process, allowing teams to optimize resources and halt failing runs instantly.

Training Loss (Convergence) Live — Epoch 9/20

Current Loss

0.082

Last epoch

Best Loss

0.079

Epoch 7

ETA

~12m

Remaining

Industrial Orchestration

From simple linear models to complex deep learning architectures, our SDK handles the plumbing. Launch, scale, and monitor jobs with just a few lines of code.

rivergen_sdk/train_job.py
from rivergen import ModelStudio

# Initialize session
studio = ModelStudio.connect()

# Create managed training job
job = studio.train(
  model="./models/transformer_v2",
  data_source="prod_postgres",
  compute="gpu_standard",
  config={
    "epochs": 20,
    "batch_size": 64,
    "optimizer": "adam"
  }
)

# Stream live telemetry
job.monitor(live=True)
Auto-Scaling Resource Isolation Audit Logging