Pulp Hugging Face Plugin
A Pulp plugin for managing Hugging Face Hub content with pull-through caching support.
Features
- Pull-through caching: Automatically fetch and cache content from Hugging Face Hub on first access
- Support for all Hugging Face content types: Models, datasets, and spaces
- Authentication support: Use Hugging Face tokens for private repositories
- API proxying: Forward API requests to Hugging Face Hub for metadata operations
- File downloads: Cache and serve model files, configuration files, and other artifacts
Installation
pip install pulp-hugging-face
Usage
Setting up a Remote
First, create a Hugging Face remote that points to the Hugging Face Hub using the REST API:
# Using curl (REST API)
curl -X POST http://localhost:5001/pulp/api/v3/remotes/hugging_face/hugging-face/ \
-H "Content-Type: application/json" \
-u admin:password \
-d '{
"name": "hf-remote",
"hf_hub_url": "https://huggingface.co",
"policy": "on_demand"
}'
For private repositories, include your Hugging Face token:
# Using curl with authentication token
curl -X POST http://localhost:5001/pulp/api/v3/remotes/hugging_face/hugging-face/ \
-H "Content-Type: application/json" \
-u admin:password \
-d '{
"name": "hf-private",
"hf_hub_url": "https://huggingface.co",
"policy": "on_demand",
"hf_token": "YOUR_HF_TOKEN"
}'
Creating a Distribution with Pull-through Caching
Create a distribution that uses the remote for pull-through caching:
# First get the remote href
REMOTE_HREF=$(curl -s http://localhost:5001/pulp/api/v3/remotes/hugging_face/hugging-face/ -u admin:password | jq -r '.results[] | select(.name=="hf-remote") | .pulp_href')
# Create distribution
curl -X POST http://localhost:5001/pulp/api/v3/distributions/hugging_face/hugging-face/ \
-H "Content-Type: application/json" \
-u admin:password \
-d "{
\"name\": \"hf-proxy\",
\"base_path\": \"huggingface\",
\"remote\": \"$REMOTE_HREF\"
}"
Note: CLI support (
pulp hugging-face
commands) is planned but not yet implemented. Currently, you need to use the REST API directly or create a simple script for automation.
Accessing Content
Once configured, you can access Hugging Face content through your Pulp instance:
# Download a model file
curl http://your-pulp-instance/pulp/content/huggingface/microsoft/DialoGPT-medium/resolve/main/config.json
# Access API endpoints
curl http://your-pulp-instance/pulp/content/huggingface/api/models/microsoft/DialoGPT-medium
# List repository files
curl http://your-pulp-instance/pulp/content/huggingface/api/models/microsoft/DialoGPT-medium/tree/main
How Pull-through Caching Works
- First request: When content is requested but not available locally, Pulp fetches it from Hugging Face Hub
- Caching: The content is stored locally and associated with the appropriate metadata
- Subsequent requests: Future requests for the same content are served from the local cache
- API forwarding: API requests are forwarded to Hugging Face Hub for real-time metadata
Supported URL Patterns
The plugin supports the standard Hugging Face Hub URL patterns:
-
File downloads:
/{repo_id}/resolve/{revision}/{filename}
-
API endpoints:
/api/models/{repo_id}
,/api/datasets/{repo_id}
,/api/spaces/{repo_id}
-
Repository trees:
/api/{repo_type}s/{repo_id}/tree/{revision}
-
Git LFS:
/api/{repo_type}s/{repo_id}/git/lfs/*
Configuration Options
Remote Configuration
-
hf_hub_url
: Base URL for Hugging Face Hub (default: https://huggingface.co) -
hf_token
: Authentication token for private repositories -
policy
: Set toon_demand
to enable pull-through caching
Content Types
The plugin handles various Hugging Face content types:
- Models: PyTorch models, TensorFlow models, configuration files
- Datasets: Training data, evaluation data, data descriptions
- Spaces: Gradio apps, Streamlit apps, static sites
Development
Setting up Development Environment
git clone https://github.com/pulp/pulp_hugging_face.git
cd pulp_hugging_face
pip install -e .
Running Tests
pytest
CLI Support (TODO)
CLI support for this plugin is planned but not yet implemented. The plugin currently supports:
- ✅ REST API: Full functionality via
/pulp/api/v3/
- ❌ CLI Commands:
pulp hugging-face
commands not yet available - ❌ Client Libraries: Python/Ruby clients not yet generated
To add CLI support, the following would need to be implemented:
- CLI command definitions in a
cli/
directory - Client library generation
- Integration with
pulp-cli
package
For now, use the REST API directly or the provided example script for automation.
How to File an Issue
File through this project's GitHub issues and appropriate labels.
WARNING Is this security related? If so, please follow the Security Disclosures procedure.
Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our GitHub repository.
License
This project is licensed under the GNU General Public License v2.0 or later.