Design and implement full-stack AI solutions using LLMs and multi-agent orchestration. Engage with customers for solution delivery while troubleshooting and optimizing performance under pressure. Fine-tune models and collaborate across teams for large-scale AI deployment.
Meet the Team
We are a dynamic team of AI solutions, specializing in building innovative technology for large-scale training and inference clusters using Cisco infrastructure components. Our focus includes developing advanced AI applications, creating and fine-tuning domain-specific LLMs, or building entirely new LLMs from scratch, as well as providing LLMs as a service and developing sophisticated RAG solutions. Additionally, we innovate with Agentic AI to optimize and handle network operations. We work with the top 20+ largest service providers in the APJ region, building next-generation AI solutions that serve over 50% of humanity through these providers. Our team is dedicated to revolutionizing the AI landscape, demonstrating the latest advancements to drive significant, real-world impact.
Your Impact
End-to-End AI & Agentic AI Architecture, Design & Development
Customer-Facing Engineering, Workshops, Solution Validation & Delivery
Real-time Troubleshooting, Performance Optimization & Pressure Handling
Multi-functional Collaboration for Large-Scale Programs
Continuous Improvement, Model Tuning & AI Lifecycle Operations
Minimum Qualifications
Preferred Qualifications
Why Cisco?
At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.
We are a dynamic team of AI solutions, specializing in building innovative technology for large-scale training and inference clusters using Cisco infrastructure components. Our focus includes developing advanced AI applications, creating and fine-tuning domain-specific LLMs, or building entirely new LLMs from scratch, as well as providing LLMs as a service and developing sophisticated RAG solutions. Additionally, we innovate with Agentic AI to optimize and handle network operations. We work with the top 20+ largest service providers in the APJ region, building next-generation AI solutions that serve over 50% of humanity through these providers. Our team is dedicated to revolutionizing the AI landscape, demonstrating the latest advancements to drive significant, real-world impact.
Your Impact
End-to-End AI & Agentic AI Architecture, Design & Development
- Design and implement full-stack AI solutions using LLMs, multi-agent orchestration frameworks, knowledge graphs, RAG pipelines, vector databases, and end-to-end automation workflows.
- Develop high-performance backend components using Python and Go, including microservices, inference pipelines, orchestration logic, evaluation frameworks, and data preprocessing layers.
- Build and operationalize AI workloads on Cisco AI Factory platforms: including GPU clusters, RDMA-optimized fabric, storage, monitoring, scaling, and workload scheduling.
- Integrate AI pipelines with Cisco platforms such as CNC, PCA, HCO, ND, Foresight Master Agent, Telemetry pipelines, SIEM/SOAR, and CX engineering systems.
- Ensure designs meet production-grade standards for scalability, resiliency, security, and observability.
Customer-Facing Engineering, Workshops, Solution Validation & Delivery
- Engage directly with customers to understand business requirements, technical constraints, security posture, and success metrics.
- Conduct deep-dive workshops, design sessions, and solution walk-throughs with customer architecture, NOC/SOC, and operations teams.
- Build PoCs, demos, and reference architectures tailored to customer networks, data models, and operational workflows.
- Support deployment in customer labs and production-ensuring smooth onboarding, configuration, performance tuning, and validation.
- Communicate complex technical topics clearly to both CXOs and engineering teams-acting as a trusted advisor for AI-driven automation and transformation.
Real-time Troubleshooting, Performance Optimization & Pressure Handling
- Diagnose issues across multi-layer architecture: GPU workloads, RAG stores, LLM inference, agent workflows, data pipelines, network telemetry, and security logic.
- Work under high-pressure, time-critical situations-supporting customer critical issues, live demos, high-visibility PoCs, and executive reviews.
- Optimize inference latency, throughput, memory footprint, GPU utilization, and service reliability for large-scale deployments.
- Implement monitoring systems, fault detection, and self-healing behaviors aligned with Cisco AI Factory's operational principles.
- Rapidly prototype and validate fixes, improvements, and automation scripts using Python/Go for real-time resolution.
Multi-functional Collaboration for Large-Scale Programs
- Coordinate with Cisco BU, CX, Sales Engineering, Partners, and Customer Teams to deliver end-to-end AI Factory based solutions.
- Translate customer workflows into technical requirements, PRDs, and solution blueprints used across development teams.
- Participate in design reviews, architecture councils, customer governance meetings, and cross-functional planning discussions.
- Ensure alignment between Agentic AI frameworks, Cisco platform SDKs, CNC-based controllers, and customer automation stacks.
- Drive documentation, standard processes, and reusable assets for repeatable deployments across APJ and global SP/Enterprise accounts.
Continuous Improvement, Model Tuning & AI Lifecycle Operations
- Fine-tune LLMs and domain-specific models using customer data while maintaining privacy, anonymization, and on-prem compliance policies.
- Implement continuous retraining, evaluation, versioning, and rollout pipelines integrated into AI Factory DevOps.
- Conduct model benchmarking, hallucination analysis, safety checks, guardrail enforcement, and accuracy improvements.
- Enhance end-to-end system performance across retrieval, inference, agents, reasoning, orchestration, and telemetry.
- Contribute to long-term roadmap discussions, architectural evolution, and next-generation AI Factory capabilities.
Minimum Qualifications
- Strong expertise in Python and Go, including API development, microservices, async workflows, and distributed systems.
- Deep understanding of LLMs, vector databases, embeddings, RAG frameworks, agentic AI orchestrators, and prompt engineering.
- Hands-on experience with GPU environments (NVIDIA or AMD), optimized inference stacks, RDMA/RoCE-based GPU networking, and AI workload scheduling.
- Experience integrating AI solutions with network automation platforms, telemetry pipelines, and security frameworks.
- Ability to work under extreme time pressure, manage ambiguity, and deliver consistent high-quality outcomes in front of demanding customers.
- Strong problem-solving, debugging, and systems thinking across compute, storage, network, and ML layers.
- Excellent communication skills with the ability to explain complex AI architectures to both engineering and executive stakeholders.
Preferred Qualifications
- Experience with Service Provider networks, telecom data, and NOC/SOC operations.
- Familiarity with Cisco CNC, PCA, HCO, Foresight, ThousandEyes, SecureX, Splunk, and related ecosystems.
- Knowledge of Kubernetes, distributed tracing, event-driven architectures, Kafka/NATS, microservices, and modern DevOps practices.
- Previous experience deploying AI or automation solutions in production enterprise or SP environments.
Why Cisco?
At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.
Top Skills
AI
Go
Gpu
Kafka
Kubernetes
Llms
Microservices
Nats
Python
Rdma
Splunk
Vector Databases
Similar Jobs at Cisco ThousandEyes
Cloud • Software
The Solutions Engineer designs and validates enterprise solutions, providing technical expertise to clients while collaborating with sales teams and conducting engaging presentations.
Top Skills:
Catalyst Switching & RoutingCisco Catalyst CenterCisco SecurityCloud-Managed SolutionsDuoIot NetworkingMerakiSd-WanThousandeyesUmbrella
What you need to know about the Kolkata Tech Scene
When considering the industries shaping India's tech scene, gaming might not immediately come to mind. However, in the last decade, increased internet usage and greater access to mobile devices have catapulted the industry to new heights, with Kolkata-based companies like Virtualinfocom, Red Apple Technologies and Digitoonz, at the forefront, driving the design and animation of new gaming titles for players.

