AI from deep cloud to far-edge: A flawless end-to-end experience
IBM and Qualcomm Technologies join forces.
This post originally appeared on Qualcomm's developer blog.
In an era where AI is becoming increasingly ubiquitous, the need for powerful, flexible, and responsible AI solutions has never been greater. Moreover, there is a plethora of AI models out there and ready to be used; let it be from open-source platforms or from proprietary business-grade sources. Users of AI enabled edge devices such as mobiles phones and cars and also suppliers of AI based solutions are becoming increasingly concerned about the outcomes of modern AI and the manageability and use of data.
The best model won’t help, if there are no sufficient and seamless capabilities for managing the lifecycle of models and apps end-to-end from enterprise to the edge device. The situation becomes even more challenging when dealing with models which are aimed to be operating at the edge. Among other players, the family of IBM Granite models is perfectly suited to be used for the task. Not only is the performance and accuracy state of the art, yet the models are also compact enough (<10B) to be deployed where they matter the most for specific use cases, on the devices.
Through this technical blog we would like to explore the benefits of the collaboration between IBM watsonx and Qualcomm, designed to empower developers and businesses to harness the full potential of AI at the edge with an enterprise grade toolsuite integrating edge to cloud.
The next-generation AI and data platforms are transforming the way organizations approach AI development and deployment. The collaboration between IBM and Qualcomm Technologies extends IBM’s watsonx capabilities to the edge, creating a seamless pipeline from model development to on-device deployment using Qualcomm AI Hub platform. Combining strengths of both IBM’s capabilities for enterprise SW with Qualcomm leading edge technology offers great results for trustworthy AI on Edge devices.
- Rapid Prototyping and Deployment: Use watsonx.ai and Instruct Lab to quickly develop and fine-tune models, then seamlessly deploy them via Qualcomm AI Hub.
- End-to-end AI Lifecycle Management: From data preparation to model deployment, our integrated solution covers every step of the AI journey.
- Responsible AI at Scale: Leverage watsonx.governance to ensure that the prompts are safe and free from harm.
- Optimized Edge Performance: Harness Qualcomm Technologies’ expertise in edge computing to deploy high-performance, energy-efficient AI models on a wide range of devices.
- Model Optimization: Leverage Qualcomm AI Hub automatic conversion and optimization of PyTorch or ONNX models for efficient on-device deployment using TensorFlow Lite, ONNX Runtime, or our proprietary Qualcomm AI Engine Direct SDK.
- Pre-commercial and commercial device access: Access to pre- and commercial Qualcomm Technologies & Snapdragon platforms for testing on real physical devices.
A one stop integrated AI platform that offers a comprehensive suite of tools for developing, deploying, and managing AI solutions.
- Build with ease: enables development of powerful AI solutions with user-friendly interfaces, workflows and access to industry-standard APIs and SDKs.
- Find it all in the integrated studio: one-stop access to capabilities that span the AI development lifecycle with built-in performance and scalability.
- Collaborate in a generative AI toolkit: Unlock innovations for the AI builder through a collaborative development experience — with or without code.
- Work and deploy where you want: Quickly build, run and manage gen AI applications in the hybrid cloud platform of your choice.
- InstructLab, our innovative tool within watsonx.ai, allows developers to:
- Rapidly prototype: Quickly experiment with different prompts and model configurations.
- Fine-tune with ease: Adapt foundation models to specific edge use cases with minimal coding. Collaborate Efficiently: Share prompts and results with team members, fostering a collaborative AI development environment.
Effective edge AI requires not just powerful models, but also well-managed data. watsonx.data offers:
- Unified data access: Easily integrate data from various sources to train your edge AI models.
- Data governance: Ensure data quality and compliance throughout the AI lifecycle.
- Scalable storage: Handle large datasets required for training complex edge AI models.
In today's regulatory landscape, responsible AI is not just an option—it's a necessity. watsonx.governance provides:
- AI Fairness: Detect and mitigate bias in your models before deployment.
- Explainability: Understand how your models make decisions, crucial for edge applications in sectors like healthcare or finance.
- Compliance monitoring: Ensure your edge AI solutions adhere to relevant regulations and company policies.
Qualcomm AI Hub: accelerating on-device AI development and deployment
Qualcomm AI Hub is a developer-centric platform designed to streamline on-device AI development and deployment. It contains a library of over 100 pre-optimized AI models for Snapdragon and Qualcomm Technologies’ platforms for superior on-device AI performance, lower memory utilization, and better power efficiency. The optimized models are available today for mobile, compute, automotive and IoT platforms on the Qualcomm AI Hub, GitHub, and Hugging Face.
Additionally, Qualcomm AI Hub allows developers to “Bring Your Own Model” (BYOM), for automatic conversion of PyTorch or ONNX models for efficient on-device deployment using TensorFlow Lite, ONNX Runtime, or our proprietary Qualcomm AI Engine Direct SDK.
- Optimized Model Performance: Unique conversion pipeline delivers optimized, compiled models ready for deployment on our platforms, enabling high-performance, power-efficiency.
- Extensive Model Collection: 100+ pre-optimized AI models.
- Rich set of tools to curate third-party models (BYOM) or create models based on data (BYOD) from developers.
- Pre-commercial and commercial device access: Developers can now get access to pre-commercial and commercial Qualcomm Technologies and Snapdragon platforms, testing on real physical devices on the Qualcomm AI Hub’s platform.
- Comprehensive Ecosystem: Collaborations with industry leaders like Microsoft, Google, Mistral, AWS, and IBM, combined with our expertise, enable end-to-end ownership of AI on edge devices.
The AI model lifecycle on an edge device involves three phases:
- Develop
- Optimize and test
- Deploy
This process enables that AI models on edge devices that are optimized, deployed efficiently, and maintained for optimal performance through ongoing monitoring and updates.
-
Model Development
a. Use case registration: Recording the requirement for a model
b. Model selection: Picking the right model for the use case
c. Data preparation: Selecting the right method to source the data
d. Model training: Training a small model to perform on edge
e. Model evaluation: Test the quality of the inference such as accuracy or latency -
Model Optimization and Testing
a. Bring your AI model toon the Qualcomm AI Hub platform: Leverage Qualcomm AI Hub BYOM (Bring Your Own Model) feature
b. Optimize model: Automatic model optimization for edge deployment
c. Prepare model for runtime: Select the runtime and device
d. Use model in an app: Develop an App using the optimized, compiled model -
Model Deployment and Monitoring
a. Package app/model into the OEM device: Prepare for deploy
b. Model runs on OEM hardware utilizing local orchestration
c. Complex routing requests handled via orchestration on cloud
d. Monitor model for quality: Evaluate the quality of inference at runtime
e. Fine tune and OTA updates: Continuously improve model and deploy
Let’s delve into how the available tools support critical steps in the three phases described above in more detail:
IBM watsonx.data enhances data preparation for AI workloads in industrial edge environments by providing a unified platform that can manage diverse data types from IoT devices, machinery, and sensors at the edge. Its architecture enables efficient data collection and processing directly at edge locations, minimizing latency and ensuring real-time data consistency across operational processes. By supporting open data formats and machine learning libraries, watsonx.data allows data scientists and engineers to prepare AI models at the edge, driving rapid insights for asset optimization, predictive maintenance, and real-time monitoring.
a) Exploring Data: The Data manager module in IBM watsonx.data allows you to browse, search, and manage schemas and tables by engine, viewing associated catalogs and tables. You can also create schemas and configure tables directly from the web console using Data Definition Language (DDL) commands.
b) Ingesting Data: Data ingestion allows users to import and load data through the UI, object storage, or CLI, supporting formats like IBM Storage Ceph, AWS S3, and MinIO. The process can be done securely via the Ingest data tab, web console, or CLI, enabling optimized file formats and SQL querying capabilities with Presto.
Sample CLI command to ingest parquet file:
ibm-lh data-copy --source-data-files SOURCE_DATA_FILE \
--staging-location s3://lh-target/staging \
--target-tables TARGET_TABLES \
--ingestion-engine-endpoint INGESTION_ENGINE_ENDPOINT \
--dbuser DBUSER \
--dbpassword DBPASSWORD \
--create-if-not-exist
c) Querying Data: SQL queries can be run through the Query workspace interface, supporting data manipulation and visualization with tools like Visual Explain for query execution plans. The Visual Explain feature validates SQL queries and displays execution details in graphical formats, while Query History tracks and audits all past and current queries. The Query History Monitoring and Management (QHMM) service manages diagnostic data, storing query histories and events.
import prestodb
# Get connection object
presto_conn = prestodb.dbapi.connect(
host = env.PRESTO_HOST,
port = env.PRESTO_HOST_PORT,
user = env.PRESTO_USER,
catalog = env.PRESTO_CATALOG,
schema = env.PRESTO_SCHEMA,
http_scheme = 'https',
auth = prestodb.auth.BasicAuthentication(env.PRESTO_USER, env.PRESTO_PASSWORD)
)
#Time series query
query = "SELECT * from operations_store.iot_sensor_data \
WHERE io_timestamp BETWEEN TIMESTAMP '2024-05-09 17:00:00.000' AND TIMESTAMP '2024-05-09 20:12:00.000'"
cursor = presto_conn.cursor()
cursor.execute(query)
# Fetch and print the results
data = cursor.fetchall()
cursor.close()
a. Model Development
watsonx.ai offers a comprehensive suite of tools designed to streamline the model development process for AI. It enables data scientists and developers to build, train, and deploy models with ease, leveraging robust machine learning libraries and frameworks. It provides features for model monitoring, version control, and performance optimization, making it a powerful platform for developing sophisticated AI models. watsonx is fully integrated with IBM's cloud infrastructure, hence, ensures scalable compute power and storage. Worth noting is IBM's open collaboration multi-cloud strategy allowing to utilize other cloud services providers (AWS, Azure) as well.
b. Model Fine Tuning
InstructLab allows users to contribute to Large Language Models (LLMs) without needing advanced AI/ML expertise. It overcomes the challenges high entry barriers, simplifying the process of enhancing generative AI. Through community-driven governance and best practices, InstructLab makes model contributions more accessible and supports regular updates to open-source models without full retraining. IBM’s Granite Model Series with lower parameter counts are prime candidates to be used as student models for training and developing the models for edge use cases.
Taxonomy based skill and knowledge representation
The taxonomy creation process for InstructLab involves structuring skills and knowledge contributions into a hierarchical tree with YAML files at the end nodes. Contributors can define skills (e.g., grounded or ungrounded) by creating a qna.yaml file with examples of questions, answers, and optional contexts, alongside an attribution.txt file citing sources. For knowledge contributions, a repository of Markdown files supports detailed contextual information linked through qna.yaml. The taxonomy leverages synthetic data alignment for Large Language Models (LLMs) and organizes content into cascading directories based on domains and subdomains, ensuring clarity and ease of model tuning.
Check the GitHub taxonomy repo for information.
Synthetic Data Generation
InstructLab leverages taxonomy-defined skills or knowledge to create datasets using a Large Language Model (LLM) to generate synthetic data. Run the ilab data generate command, using GPU acceleration if available, to create synthetic data. The pipeline can be customized to use alternative models or endpoints for generation. The generated dataset is saved in JSONL format within the datasets directory, named skills_train_msgs_.jsonl and knowledge_train_msgs_.jsonl, depending on the type of contribution. Confirm the dataset creation by inspecting the output directory using the ls datasets command.
ilab data generate \
--pipeline full \
--sdg-scale-factor 100 \
--endpoint-url http://localhost:8080/v1 \
--output-dir ./outputdir-watsonxai-endpoint \
--chunk-word-count 1000 \
--num-cpus 8 \
--model ibm/granite-20b-multilingual
Fine tune the model
Training a model using InstructLab involves several customizable options depending on the system and resources available. The process begins by running the ilab model train command within a Python virtual environment. GPU acceleration, , significantly enhances training speed. For advanced workflows, multi-phase training can be used, where the model is trained sequentially on knowledge and skills datasets to optimize performance. Once training is complete, the model can be evaluated to select the best checkpoint and tested to compare performance before and after training.
ilab model train \
--strategy lab-multiphase \
--phased-phase1-data <knowledge train messages jsonl> \
--phased-phase2-data <skills train messages jsonl> -y
For more information on InstructLab, visit this article.
c. Application development
After the model is trained and validated on watsonx, the model is published in Qualcomm AI Hub. The model becomes available for the OEM developers to build their custom applications. Qualcomm AI Hub enables you to do the following:
- Compile and optimize the pre-trained PyTorch model into a format that can be run on a device
- Submit a profile job to run inference with the compiled model on a real physical device with a Snapdragon or Qualcomm Technologies’ chipset
- Measure on-device model performance
- Confirm latency and memory are below required targets
- Get insights on which compute units (NPU, GPU and CPU) the model layers are running on
- Verify numerical accuracy of the model with an inference job
watsonx.governance handles lifecycle governance by extending AI best practices from predictive machine learning to generative AI while mitigating risks across models, users, and datasets. It supports responsible AI through explainability, transparency, and compliance with internal and external regulations. Watsonx.governance enables real-time monitoring of model performance and fairness and management of lifecycle metadata for models and templates, ensuring compliance. Monitoring metrics including those for RAG, drift and model performance, while Guardrails restrict outputs of hate speech, aggression and profanity. The AI risk atlas provides educational guidance on potential risks, supporting organizations in creating robust governance frameworks. Governance workflows integrate roles from model development to deployment and monitoring, fostering a structured and responsible AI lifecycle.
a. Model evaluation
Model evaluation involves running assessments on prompt templates to ensure performance and compliance. Evaluations can be configured using the wizard or via APIs, where you select dimensions and metrics, adjust settings like sample sizes and thresholds, and provide the test data to map input and expected outputs. Results are reviewed in the Evaluations tab, offering insights into metric scores, threshold violations, and visualizations over time to understand model performance and processing efficiency. This process helps ensure robust and effective model deployments.
b. Continuous compliance
The Model Risk Governance (MRG) solution, facilitates comprehensive governance across all model types within an organization. It employs object types like Models, Model Groups, Use Cases, and Use Case Reviews to manage compliance, risk ratings, and stakeholder approvals. The dashboard provides a centralized view of compliance status, validation, and risk levels. Workflows automate key governance processes, such as use case approval and model lifecycle management, ensuring thorough oversight from development to deployment. This integrated approach enables continuous compliance and risk management for enterprise AI solutions.
c. Runtime monitoring
Models are monitored in real-time, enabling OEMs to track the performance of their Generative AI applications at the Edge. This monitoring encompasses metrics such as fairness, drift, and answer relevance.
The collaboration between IBM and Qualcomm Technologies marks a significant milestone in edge AI development. By combining IBM's expertise in enterprise AI and governance with Qualcomm Technologies’ leadership in edge computing, we're providing developers with an unparalleled platform to create, deploy, and manage responsible AI solutions at the edge. This collaboration doesn't just streamline the AI development process—it opens up new possibilities for innovation across industries. From smart manufacturing to autonomous vehicles, the potential applications are limitless.
We invite developers and businesses to explore this powerful new solution. Together, let's push the boundaries of what's possible with AI at the edge.
To learn more and get started, visit IBM watsonx and Qualcomm AI Hub.