--- url: /tutorials/tutorial-datapool.md description: >- Hands-on tutorial building, deploying, and consuming a text analysis service that processes documents stored in Kipu Quantum Hub Data Pools. --- # Using Data Pools in Managed Services In this hands-on tutorial, you'll build a complete text analysis service that processes documents using our Data Pools feature. You'll learn how to upload datasets, create a service that reads from Data Pools, test it locally, deploy it, and consume it via the SDK. ## What You'll Build By the end of this tutorial, you'll have created: * A text analysis service that counts words in documents stored in Data Pools * A working local development environment * A deployed service on the platform * A Python client that consumes your service The full code of this tutorial is available in the [Implementations `service-using-data-pools`](https://dashboard.hub.kipu-quantum.com/community/implementations/5fc547be-a902-47e8-9914-6feb06e88eb7). ## Prerequisites * Node.js 20+ installed on your system * Python 3.11+ installed * A platform account with a personal access token **Note:** Replace ``, ``, `` and other placeholder values with your actual credentials throughout this tutorial. ## Step 1: Set Up Your Development Environment ### 1.1 Install and Configure the CLI First, let's install the CLI and verify it's working. Please also run this, if you already have the CLI installed, to ensure you have the latest version: ```bash # Install the current CLI npm install -g @quantum-hub/qhubctl # Verify installation qhubctl --version ``` You should see a version number. If you get an error, ensure Node.js 20+ is installed. ### 1.2 Install uv Package Manager We'll use uv, a fast Python package manager, for managing our Python dependencies: ```bash # Install uv (if not already installed) # On macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh # On Windows: # powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Verify uv installation uv --version ``` ### 1.3 Authenticate with the platform Get your personal access token from the platform (Profile → Access Tokens) and authenticate: ```bash qhubctl login -t ``` You should see a success message confirming you're logged in. ### 1.4 Create Your Service Project Let's create a new service project for our text analyzer: ```bash qhubctl init --name text-analyzer cd text-analyzer ``` This creates a project structure with: * `src/program.py` - Your main service logic * `input/` - Local test data directory * `qhub.json` - Service configuration * Other configuration files ### 1.5 Set Up Python Environment Now initialize a Python environment within our service project: ```bash # Initialize a Python project with uv in the current directory uv sync -U # Activate the environment (optional, uv will handle this automatically) source .venv/bin/activate # On Windows: .venv\Scripts\activate.[ps1|bat] ``` ## Step 2: Prepare Sample Data ### 2.1 Create Sample Text Files Let's create some sample documents to analyze. We'll create them in two locations - one set for uploading to the Data Pool and another set for local testing: ```bash # Create directories for both upload and local testing mkdir -p input/documents ``` Create sample documents for uploading to Data Pool in `input/documents/`: Create `input/documents/document1.txt`: ```bash cat > input/documents/document1.txt << 'EOF' Quantum computing is a revolutionary technology that harnesses the principles of quantum mechanics. It promises to solve complex problems that are intractable for classical computers. Quantum algorithms like Shor's algorithm and Grover's algorithm demonstrate significant speedups. EOF ``` Create `input/documents/document2.txt`: ```bash cat > input/documents/document2.txt << 'EOF' Machine learning and artificial intelligence are transforming industries worldwide. Deep learning models can process vast amounts of data to identify patterns. Natural language processing enables computers to understand human language. EOF ``` Create `input/documents/summary.json` with metadata: ```bash cat > input/documents/summary.json << 'EOF' { "collection": "Sample Documents", "total_files": 2, "description": "Demo text files for analysis", "created": "2025-08-04" } EOF ``` ### 2.2 Upload Data to a Data Pool Now upload the files from `input/documents/` to a Data Pool: ```bash qhubctl datapool upload -f ./input/documents/document1.txt -f ./input/documents/document2.txt -f ./input/documents/summary.json ``` The CLI will prompt you to create a new Data Pool. Choose "Yes" and give it a name like `text-analysis-demo`. **Save the Data Pool ID** that's returned - you'll need it later. ## Step 3: Implement the Text Analysis Service The full code of the text analysis service is available in the [Implementations `text-analyzer`](https://dashboard.hub.kipu-quantum.com/community/implementations/2fbca033-e049-40be-b988-82b14f019bc6). ### 3.1 Update the Service Logic Replace the contents of `src/program.py` with our text analyzer: ```python from qhub.commons.datapool import DataPool from pydantic import BaseModel import json from typing import Dict, List class AnalysisRequest(BaseModel): files_to_analyze: List[str] min_word_length: int = 3 class AnalysisResult(BaseModel): total_files: int word_counts: Dict[str, int] total_words: int summary: str def run(data: AnalysisRequest, documents: DataPool) -> AnalysisResult: """Analyze text files from a Data Pool and return word statistics.""" word_counts = {} files_processed = 0 for filename in data.files_to_analyze: try: # Read the text file from Data Pool with documents.open(filename, 'r') as f: content = f.read() # Simple word counting words = content.lower().split() for word in words: # Clean word and filter by length clean_word = ''.join(char for char in word if char.isalnum()) if len(clean_word) >= data.min_word_length: word_counts[clean_word] = word_counts.get(clean_word, 0) + 1 files_processed += 1 except FileNotFoundError: print(f"Warning: File {filename} not found in Data Pool") continue total_words = sum(word_counts.values()) # Find most common words top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:5] summary = f"Analyzed {files_processed} files. Top words: {dict(top_words)}" return AnalysisResult( total_files=files_processed, word_counts=word_counts, total_words=total_words, summary=summary ) ``` ### 3.2 Make Your Initial Commit to Track Your Changes \[Optional] To track your changes, initialize a Git repository and commit your code: ```bash git init git add . git commit -m "Initial commit: Implement text analysis service" ``` ## Step 4: Test Locally ### 4.1 Set Up Local Test Environment Create test input in `input/data.json`: ```bash cat > input/data.json << 'EOF' { "files_to_analyze": ["document1.txt", "document2.txt"], "min_word_length": 4 } EOF ``` ### 4.2 Update Local Test Runner Replace `src/__main__.py` to test with our Data Pool: ```python import json import os from qhub.commons.constants import OUTPUT_DIRECTORY_ENV from qhub.commons.datapool import DataPool from qhub.commons.json import any_to_json from qhub.commons.logging import init_logging from .program import AnalysisRequest, run init_logging() # Set up output directory for local testing directory = "./out" os.makedirs(directory, exist_ok=True) os.environ[OUTPUT_DIRECTORY_ENV] = directory # Load test data with open("./input/data.json") as file: data = AnalysisRequest.model_validate(json.load(file)) # Simulate DataPool injection using local directory result = run(data, documents=DataPool("./input/documents")) print("Analysis Results:") print(any_to_json(result)) ``` ### 4.3 Run Local Test Test your service locally: ```bash python -m src ``` You should see output showing the word analysis results from your sample documents. ## Step 5: Deploy Your Service ### 5.1 Generate OpenAPI Specification ```bash qhubctl openapi ``` ### 5.2 Deploy Your Service to the Platform You have two options for deployment: using the CLI or the web UI. #### 5.2.1 Deploy via CLI To deploy your service using the CLI, run: ```bash qhubctl up ``` #### 5.2.2 Deploy via Web UI Alternatively, you can deploy via the platform web interface. Therefore, you need to compress your service files into a ZIP archive: ```bash qhubctl compress ``` 1. Go to the platform web interface and navigate to services: 2. Click on `Create Service` 3. Select your ZIP file at `Source` > `File` 4. Configure the service: * Set service name: "Text Analyzer with Data Pools" * Add a Data Pool parameter named `documents` 5. Publish the service **Save your service ID** - you'll need it for the next steps. ## Step 6: Test Your Deployed Service ### 6.1 Create a Request Body Create a file called `service-request.json` with the Data Pool reference: ```bash cat > service-request.json << 'EOF' { "data": { "files_to_analyze": ["document1.txt", "document2.txt"], "min_word_length": 3 }, "documents": { "id": "", "ref": "DATAPOOL" } } EOF ``` Replace `` with the Data Pool ID from Step 2.2. ### 6.2. Test the Execution Using the UI Currently, the Jobs execution using Data Pools as input is not available. Therefore, you need to publish your service first and invoke it via an Application. Follow these steps: 1. Go to the services page in the platform app: and navigate to your service. 2. Click on `Publish Service` and `Publish internally`. 3. Go to the Applications page: and create a new Application (or reuse an existing one). 4. Navigate to the Application you want to use. 5. Click on `Subscribe Internally` and select your new service. 6. After subscribing, you can test your service by clicking on `Try it out`. 7. Open the `POST` element in the OpenAPI specification. 8. Click again on `Try it out` and paste the content of `service-request.json` into the request body. 9. Click the `Execute` button under the body to run the service. 10. Navigate to the Application again and click on the subscription of your service on `Activity Logs`. 11. Select the latest execution and click on `Show Logs`. 12. You should see the execution logs, including the analysis results similar to the local execution. ## Step 7: Build a Python Client The full code of the client is available in the [Implementations `text-analyzer-client`](https://dashboard.hub.kipu-quantum.com/community/implementations/0da165d7-0a22-4f64-8474-afcad5cfb27b?). ### 7.1 Set Up Client Environment Create a separate directory for your client: ```bash cd .. mkdir text-analyzer-client cd text-analyzer-client # Set up Python environment uv init && uv sync -U uv add qhub-service python-dotenv source .venv/bin/activate # On Windows: .venv\Scripts\activate.ps1 ``` ### 7.2 Configure Client Credentials Create a `.env` file with your application credentials (get these from your application's settings page): You can get the `ACCESS_KEY_ID`, `SECRET_ACCESS_KEY`, and `DATAPOOL_ID` from the application you created in the previous steps. The `SERVICE_ENDPOINT` can be found and copied from the subscription of your service inside the application details. ```bash cat > .env << 'EOF' SERVICE_ENDPOINT= ACCESS_KEY_ID= SECRET_ACCESS_KEY= DATAPOOL_ID= EOF ``` ### 7.3 Create the Client Script Create `analyze_client.py`: ```python import os from dotenv import load_dotenv from qhub.service.client import HubServiceClient from qhub.service.datapool import DataPoolReference # Load environment variables load_dotenv() # Initialize the client client = HubServiceClient( os.getenv("SERVICE_ENDPOINT"), os.getenv("ACCESS_KEY_ID"), os.getenv("SECRET_ACCESS_KEY") ) def analyze_documents(files_to_analyze, min_word_length=3): """Run text analysis on documents in the Data Pool.""" # Create Data Pool reference documents = DataPoolReference(id=os.getenv("DATAPOOL_ID")) # Prepare request request_body = { "data": { "files_to_analyze": files_to_analyze, "min_word_length": min_word_length }, "documents": documents } print("Starting analysis...") # Execute the service execution = client.run(request=request_body) print(f"Execution started with ID: {execution.id}") print("Waiting for completion...") # Wait for completion execution.wait_for_final_state(timeout=300) if execution.status == "SUCCEEDED": result = execution.result() print("\n=== Analysis Results ===") print(f"Status: {execution.status}") print(f"Files processed: {result.total_files}") print(f"Total words found: {result.total_words}") print(f"Summary: {result.summary}") # Show top 10 most common words word_counts = result.word_counts top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:10] print("\nTop 10 most common words:") for word, count in top_words: print(f" {word}: {count}") else: print(f"Execution failed with status: {execution.status}") logs = execution.logs() print("Error logs:") for log in logs[-5:]: # Show last 5 log entries print(f" {log}") if __name__ == "__main__": # Analyze our sample documents analyze_documents( files_to_analyze=["document1.txt", "document2.txt"], min_word_length=4 ) ``` ### 7.4 Run the Client ```bash python analyze_client.py ``` You should see the text analysis results from your deployed service! ## Step 8: Advanced Usage ### 8.1 Add More Documents Upload additional documents to your Data Pool: ```bash cd ../text-analyzer-service # Create a new document cat > input/documents/document3.txt << 'EOF' Cloud computing provides scalable infrastructure for modern applications. Microservices architecture enables independent deployment and scaling. Container orchestration platforms manage distributed systems efficiently. EOF # Upload to existing Data Pool qhubctl datapool upload -f ./input/documents/document3.txt --datapool-id ``` ### 8.2 Analyze New Documents Update your client to analyze the new document: ```python # In analyze_client.py, change the files list: analyze_documents( files_to_analyze=["document1.txt", "document2.txt", "document3.txt"], min_word_length=5 ) ``` ### 8.3 Monitor Execution Progress Add progress monitoring to your client: ```python def analyze_with_monitoring(files_to_analyze, min_word_length=3): """Run analysis with real-time status monitoring.""" documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID")) request_body = { "data": { "files_to_analyze": files_to_analyze, "min_word_length": min_word_length }, "documents": documents_ref } execution = client.run(request=request_body) print(f"Started execution: {execution.id}") # Monitor progress while not execution.has_finished: print(f"Status: {execution.status}") import time time.sleep(2) # Check every 2 seconds print(f"Final status: {execution.status}") if execution.status == "SUCCEEDED": return execution.result() else: print("Execution failed") return None ``` Then update the main block to use this function: ```python if __name__ == "__main__": # In analyze_client.py, change the files list: resutl = analyze_with_monitoring( files_to_analyze=["document1.txt", "document2.txt", "document3.txt"], min_word_length=5 ) print(result) if result else print("No results returned.") ``` And run it again: ```bash python analyze_client.py ``` You should see real-time status updates as your service processes the documents. ## What You've Accomplished 🎉 **Congratulations!** You've successfully: 1. ✅ Set up the CLI and authenticated 2. ✅ Created sample data and uploaded it to a Data Pool 3. ✅ Built a text analysis service that reads from Data Pools 4. ✅ Tested your service locally with simulated Data Pools 5. ✅ Deployed your service to the platform 6. ✅ Created a Python client that consumes your service 7. ✅ Learned how to monitor executions and handle results ## Key Concepts Learned * **Data Pools**: Managed file collections that can be mounted into services * **Local Testing**: Simulating Data Pools with local directories * **Service Parameters**: How Data Pool parameters are injected into your service * **SDK Integration**: Using a `DataPoolReference` to use DataPools in services * **Error Handling**: Managing file not found errors and execution failures ## Next Steps * Try uploading larger datasets (remember the 500 MB per file limit) * Experiment with different analysis algorithms * Build services that write results back to output Data Pools * Explore the workflow orchestration features for multistep data processing ## References \[CLI] [CLI Reference | Docs](https://docs.hub.kipu-quantum.com/cli-reference.html) \[DataPool] [Using Data Pools in Services | Docs](https://docs.hub.kipu-quantum.com/services/managed/datapool.html) \[SDK] [Service SDK Reference | Docs](https://docs.hub.kipu-quantum.com/sdk-reference-service.html)