---
url: /tutorials/tutorial-datapool.md
description: >-
  Hands-on tutorial building, deploying, and consuming a text analysis service
  that processes documents stored in Kipu Quantum Hub Data Pools.
---

# Using Data Pools in Managed Services

In this hands-on tutorial, you'll build a complete text analysis service that processes documents using our Data Pools feature.
You'll learn how to upload datasets, create a service that reads from Data Pools, test it locally, deploy it, and consume it via the SDK.

## What You'll Build

By the end of this tutorial, you'll have created:

* A text analysis service that counts words in documents stored in Data Pools
* A working local development environment
* A deployed service on the platform
* A Python client that consumes your service

The full code of this tutorial is available in the [Implementations `service-using-data-pools`](https://dashboard.hub.kipu-quantum.com/community/implementations/5fc547be-a902-47e8-9914-6feb06e88eb7).

## Prerequisites

* Node.js 20+ installed on your system
* Python 3.11+ installed
* A platform account with a personal access token

**Note:** Replace `<your-token>`, `<your-access-key>`, `<your-secret-access-key>` and other placeholder values with your actual credentials throughout this tutorial.

## Step 1: Set Up Your Development Environment

### 1.1 Install and Configure the CLI

First, let's install the CLI and verify it's working.
Please also run this, if you already have the CLI installed, to ensure you have the latest version:

```bash
# Install the current CLI
npm install -g @quantum-hub/qhubctl

# Verify installation
qhubctl --version
```

You should see a version number.
If you get an error, ensure Node.js 20+ is installed.

### 1.2 Install uv Package Manager

We'll use uv, a fast Python package manager, for managing our Python dependencies:

```bash
# Install uv (if not already installed)
# On macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows:
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Verify uv installation
uv --version
```

### 1.3 Authenticate with the platform

Get your personal access token from the platform (Profile → Access Tokens) and authenticate:

```bash
qhubctl login -t <your-personal-access-token>
```

You should see a success message confirming you're logged in.

### 1.4 Create Your Service Project

Let's create a new service project for our text analyzer:

```bash
qhubctl init --name text-analyzer
cd text-analyzer
```

This creates a project structure with:

* `src/program.py` - Your main service logic
* `input/` - Local test data directory
* `qhub.json` - Service configuration
* Other configuration files

### 1.5 Set Up Python Environment

Now initialize a Python environment within our service project:

```bash
# Initialize a Python project with uv in the current directory
uv sync -U

# Activate the environment (optional, uv will handle this automatically)
source .venv/bin/activate  # On Windows: .venv\Scripts\activate.[ps1|bat]
```

## Step 2: Prepare Sample Data

### 2.1 Create Sample Text Files

Let's create some sample documents to analyze. We'll create them in two locations - one set for uploading to the Data Pool and another set for local testing:

```bash
# Create directories for both upload and local testing
mkdir -p input/documents
```

Create sample documents for uploading to Data Pool in `input/documents/`:

Create `input/documents/document1.txt`:

```bash
cat >  input/documents/document1.txt << 'EOF'
Quantum computing is a revolutionary technology that harnesses the principles of quantum mechanics.
It promises to solve complex problems that are intractable for classical computers.
Quantum algorithms like Shor's algorithm and Grover's algorithm demonstrate significant speedups.
EOF
```

Create `input/documents/document2.txt`:

```bash
cat >  input/documents/document2.txt << 'EOF'
Machine learning and artificial intelligence are transforming industries worldwide.
Deep learning models can process vast amounts of data to identify patterns.
Natural language processing enables computers to understand human language.
EOF
```

Create `input/documents/summary.json` with metadata:

```bash
cat > input/documents/summary.json << 'EOF'
{
  "collection": "Sample Documents",
  "total_files": 2,
  "description": "Demo text files for analysis",
  "created": "2025-08-04"
}
EOF
```

### 2.2 Upload Data to a Data Pool

Now upload the files from `input/documents/` to a Data Pool:

```bash
qhubctl datapool upload -f ./input/documents/document1.txt -f ./input/documents/document2.txt -f ./input/documents/summary.json
```

The CLI will prompt you to create a new Data Pool.
Choose "Yes" and give it a name like `text-analysis-demo`.
**Save the Data Pool ID** that's returned - you'll need it later.

## Step 3: Implement the Text Analysis Service

The full code of the text analysis service is available in the [Implementations `text-analyzer`](https://dashboard.hub.kipu-quantum.com/community/implementations/2fbca033-e049-40be-b988-82b14f019bc6).

### 3.1 Update the Service Logic

Replace the contents of `src/program.py` with our text analyzer:

```python
from qhub.commons.datapool import DataPool
from pydantic import BaseModel
import json
from typing import Dict, List

class AnalysisRequest(BaseModel):
    files_to_analyze: List[str]
    min_word_length: int = 3

class AnalysisResult(BaseModel):
    total_files: int
    word_counts: Dict[str, int]
    total_words: int
    summary: str

def run(data: AnalysisRequest, documents: DataPool) -> AnalysisResult:
    """Analyze text files from a Data Pool and return word statistics."""
    
    word_counts = {}
    files_processed = 0
    
    for filename in data.files_to_analyze:
        try:
            # Read the text file from Data Pool
            with documents.open(filename, 'r') as f:
                content = f.read()
            
            # Simple word counting
            words = content.lower().split()
            for word in words:
                # Clean word and filter by length
                clean_word = ''.join(char for char in word if char.isalnum())
                if len(clean_word) >= data.min_word_length:
                    word_counts[clean_word] = word_counts.get(clean_word, 0) + 1
            
            files_processed += 1
            
        except FileNotFoundError:
            print(f"Warning: File {filename} not found in Data Pool")
            continue
    
    total_words = sum(word_counts.values())
    
    # Find most common words
    top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:5]
    summary = f"Analyzed {files_processed} files. Top words: {dict(top_words)}"
    
    return AnalysisResult(
        total_files=files_processed,
        word_counts=word_counts,
        total_words=total_words,
        summary=summary
    )
```

### 3.2 Make Your Initial Commit to Track Your Changes \[Optional]

To track your changes, initialize a Git repository and commit your code:

```bash
git init
git add .
git commit -m "Initial commit: Implement text analysis service"
```

## Step 4: Test Locally

### 4.1 Set Up Local Test Environment

Create test input in `input/data.json`:

```bash
cat > input/data.json << 'EOF'
{
  "files_to_analyze": ["document1.txt", "document2.txt"],
  "min_word_length": 4
}
EOF
```

### 4.2 Update Local Test Runner

Replace `src/__main__.py` to test with our Data Pool:

```python
import json
import os
from qhub.commons.constants import OUTPUT_DIRECTORY_ENV
from qhub.commons.datapool import DataPool
from qhub.commons.json import any_to_json
from qhub.commons.logging import init_logging
from .program import AnalysisRequest, run

init_logging()

# Set up output directory for local testing
directory = "./out"
os.makedirs(directory, exist_ok=True)
os.environ[OUTPUT_DIRECTORY_ENV] = directory

# Load test data
with open("./input/data.json") as file:
    data = AnalysisRequest.model_validate(json.load(file))

# Simulate DataPool injection using local directory
result = run(data, documents=DataPool("./input/documents"))

print("Analysis Results:")
print(any_to_json(result))
```

### 4.3 Run Local Test

Test your service locally:

```bash
python -m src
```

You should see output showing the word analysis results from your sample documents.

## Step 5: Deploy Your Service

### 5.1 Generate OpenAPI Specification

```bash
qhubctl openapi
```

### 5.2 Deploy Your Service to the Platform

You have two options for deployment: using the CLI or the web UI.

#### 5.2.1 Deploy via CLI

To deploy your service using the CLI, run:

```bash
qhubctl up
```

#### 5.2.2 Deploy via Web UI

Alternatively, you can deploy via the platform web interface.
Therefore, you need to compress your service files into a ZIP archive:

```bash
qhubctl compress
```

1. Go to the platform web interface and navigate to services: <https://dashboard.hub.kipu-quantum.com/services>
2. Click on `Create Service`
3. Select your ZIP file at `Source` > `File`
4. Configure the service:
   * Set service name: "Text Analyzer with Data Pools"
   * Add a Data Pool parameter named `documents`
5. Publish the service

**Save your service ID** - you'll need it for the next steps.

## Step 6: Test Your Deployed Service

### 6.1 Create a Request Body

Create a file called `service-request.json` with the Data Pool reference:

```bash
cat > service-request.json << 'EOF'
{
  "data": {
    "files_to_analyze": ["document1.txt", "document2.txt"],
    "min_word_length": 3
  },
  "documents": {
    "id": "<your-datapool-id>",
    "ref": "DATAPOOL"
  }
}
EOF
```

Replace `<your-datapool-id>` with the Data Pool ID from Step 2.2.

### 6.2. Test the Execution Using the UI

Currently, the Jobs execution using Data Pools as input is not available.
Therefore, you need to publish your service first and invoke it via an Application.
Follow these steps:

1. Go to the services page in the platform app: <https://dashboard.hub.kipu-quantum.com/services> and navigate to your service.
2. Click on `Publish Service` and `Publish internally`.
3. Go to the Applications page: <https://dashboard.hub.kipu-quantum.com/applications> and create a new Application (or reuse an existing one).
4. Navigate to the Application you want to use.
5. Click on `Subscribe Internally` and select your new service.
6. After subscribing, you can test your service by clicking on `Try it out`.
7. Open the `POST` element in the OpenAPI specification.
8. Click again on `Try it out` and paste the content of `service-request.json` into the request body.
9. Click the `Execute` button under the body to run the service.
10. Navigate to the Application again and click on the subscription of your service on `Activity Logs`.
11. Select the latest execution and click on `Show Logs`.
12. You should see the execution logs, including the analysis results similar to the local execution.

## Step 7: Build a Python Client

The full code of the client is available in the [Implementations `text-analyzer-client`](https://dashboard.hub.kipu-quantum.com/community/implementations/0da165d7-0a22-4f64-8474-afcad5cfb27b?).

### 7.1 Set Up Client Environment

Create a separate directory for your client:

```bash
cd ..
mkdir text-analyzer-client
cd text-analyzer-client

# Set up Python environment
uv init && uv sync -U
uv add qhub-service python-dotenv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate.ps1
```

### 7.2 Configure Client Credentials

Create a `.env` file with your application credentials (get these from your application's settings page):

You can get the `ACCESS_KEY_ID`, `SECRET_ACCESS_KEY`, and `DATAPOOL_ID` from the application you created in the previous steps.
The `SERVICE_ENDPOINT` can be found and copied from the subscription of your service inside the application details.

```bash
cat > .env << 'EOF'
SERVICE_ENDPOINT=<your-service-endpoint>
ACCESS_KEY_ID=<your-access-key-id>
SECRET_ACCESS_KEY=<your-secret-access-key>
DATAPOOL_ID=<your-datapool-id>
EOF
```

### 7.3 Create the Client Script

Create `analyze_client.py`:

```python
import os
from dotenv import load_dotenv
from qhub.service.client import HubServiceClient
from qhub.service.datapool import DataPoolReference

# Load environment variables
load_dotenv()

# Initialize the client
client = HubServiceClient(
    os.getenv("SERVICE_ENDPOINT"),
    os.getenv("ACCESS_KEY_ID"),
    os.getenv("SECRET_ACCESS_KEY")
)

def analyze_documents(files_to_analyze, min_word_length=3):
    """Run text analysis on documents in the Data Pool."""
    
    # Create Data Pool reference
    documents = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
    
    # Prepare request
    request_body = {
        "data": {
            "files_to_analyze": files_to_analyze,
            "min_word_length": min_word_length
        },
        "documents": documents
    }
    
    print("Starting analysis...")
    
    # Execute the service
    execution = client.run(request=request_body)
    
    print(f"Execution started with ID: {execution.id}")
    print("Waiting for completion...")
    
    # Wait for completion
    execution.wait_for_final_state(timeout=300)
    
    if execution.status == "SUCCEEDED":
        result = execution.result()
        print("\n=== Analysis Results ===")
        print(f"Status: {execution.status}")
        print(f"Files processed: {result.total_files}")
        print(f"Total words found: {result.total_words}")
        print(f"Summary: {result.summary}")
        
        # Show top 10 most common words
        word_counts = result.word_counts
        top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:10]
        print("\nTop 10 most common words:")
        for word, count in top_words:
            print(f"  {word}: {count}")
            
    else:
        print(f"Execution failed with status: {execution.status}")
        logs = execution.logs()
        print("Error logs:")
        for log in logs[-5:]:  # Show last 5 log entries
            print(f"  {log}")

if __name__ == "__main__":
    # Analyze our sample documents
    analyze_documents(
        files_to_analyze=["document1.txt", "document2.txt"],
        min_word_length=4
    )
```

### 7.4 Run the Client

```bash
python analyze_client.py
```

You should see the text analysis results from your deployed service!

## Step 8: Advanced Usage

### 8.1 Add More Documents

Upload additional documents to your Data Pool:

```bash
cd ../text-analyzer-service

# Create a new document
cat > input/documents/document3.txt << 'EOF'
Cloud computing provides scalable infrastructure for modern applications.
Microservices architecture enables independent deployment and scaling.
Container orchestration platforms manage distributed systems efficiently.
EOF

# Upload to existing Data Pool
qhubctl datapool upload -f ./input/documents/document3.txt --datapool-id <your-datapool-id>
```

### 8.2 Analyze New Documents

Update your client to analyze the new document:

```python
# In analyze_client.py, change the files list:
analyze_documents(
    files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
    min_word_length=5
)
```

### 8.3 Monitor Execution Progress

Add progress monitoring to your client:

```python
def analyze_with_monitoring(files_to_analyze, min_word_length=3):
    """Run analysis with real-time status monitoring."""
    
    documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
    
    request_body = {
        "data": {
            "files_to_analyze": files_to_analyze,
            "min_word_length": min_word_length
        },
        "documents": documents_ref
    }
    
    execution = client.run(request=request_body)
    print(f"Started execution: {execution.id}")
    
    # Monitor progress
    while not execution.has_finished:
        print(f"Status: {execution.status}")
        import time
        time.sleep(2)  # Check every 2 seconds
    
    print(f"Final status: {execution.status}")
    
    if execution.status == "SUCCEEDED":
        return execution.result()
    else:
        print("Execution failed")
        return None
```

Then update the main block to use this function:

```python
if __name__ == "__main__":
   # In analyze_client.py, change the files list:
   resutl = analyze_with_monitoring(
       files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
       min_word_length=5
   )
   
   print(result) if result else print("No results returned.")
```

And run it again:

```bash
python analyze_client.py
```

You should see real-time status updates as your service processes the documents.

## What You've Accomplished

🎉 **Congratulations!** You've successfully:

1. ✅ Set up the CLI and authenticated
2. ✅ Created sample data and uploaded it to a Data Pool
3. ✅ Built a text analysis service that reads from Data Pools
4. ✅ Tested your service locally with simulated Data Pools
5. ✅ Deployed your service to the platform
6. ✅ Created a Python client that consumes your service
7. ✅ Learned how to monitor executions and handle results

## Key Concepts Learned

* **Data Pools**: Managed file collections that can be mounted into services
* **Local Testing**: Simulating Data Pools with local directories
* **Service Parameters**: How Data Pool parameters are injected into your service
* **SDK Integration**: Using a `DataPoolReference` to use DataPools in services
* **Error Handling**: Managing file not found errors and execution failures

## Next Steps

* Try uploading larger datasets (remember the 500 MB per file limit)
* Experiment with different analysis algorithms
* Build services that write results back to output Data Pools
* Explore the workflow orchestration features for multistep data processing

## References

\[CLI] [CLI Reference | Docs](https://docs.hub.kipu-quantum.com/cli-reference.html)

\[DataPool] [Using Data Pools in Services | Docs](https://docs.hub.kipu-quantum.com/services/managed/datapool.html)

\[SDK] [Service SDK Reference | Docs](https://docs.hub.kipu-quantum.com/sdk-reference-service.html)
