DigitalOcean Logo

DigitalOcean

Staff Software Engineer-AI

Posted 2 Hours Ago
Be an Early Applicant
In-Office
Hyderabad, Telangana
Expert/Leader
In-Office
Hyderabad, Telangana
Expert/Leader
Technical leader designing and implementing massive-scale, stateful multi-agent multi-turn simulation and evaluation systems. Build persona synthesis pipelines, what-if benchmarking frameworks, durable workflow orchestration, and high-performance APIs while integrating LLMs and agentic architectures. Drive architecture, mentor engineers, lead cross-functional strategy, and ensure scalability, reliability, and observability for AI feedback and evaluation infrastructure.
The summary above was generated by AI

Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you’ll find your place here.  We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world. 

About the Role

DigitalOcean's Agentic AI organization provides a powerful inference cloud, Managed Agents, and robust Feedback systems that enable customers to run AI inference confidently at scale. We are looking for a Staff Software Engineer to serve as a technical leader within our Feedback Systems team, driving the architecture for the massive-scale infrastructure that simulates, tests, and evaluates AI agents.
As an IC5 Staff Engineer, you will define the architectural vision for systems that simulate multi-agent, multi-turn deployments complete with tool integration. This enables customers to run and analyze "what-if" scenarios to evaluate alternative configurations for their AI agents. This is a high-impact leadership role where you will solve advanced problems at the intersection of LLM orchestration, synthetic data generation, and behavioral simulation—including defining realistic user personas from historical telemetry and structuring automated evaluation objectives and constraints. You will set the technical standard for the team and guide the engineering strategy across the Agentic AI organization.

What You'll Do:
  • Simulation Architecture & Orchestration: Leading the end-to-end design and implementation of multi-agent, multi-turn simulation environments. You will architect systems where synthetic users and AI agents interact dynamically, leverage tools, and execute complex workflows to test boundary conditions.
  • Persona & Scenario Generation: Designing ML pipelines that analyze historical user conversations to automatically extract, define, and synthesize realistic user personas and multi-turn simulation goals, mirroring real-world customer behavior.
  • "What-If" Evaluation Frameworks: Building the core methodology and scoring infrastructure that allows customers to run alternative configuration scenarios, benchmark agent behavior, and safely evaluate non-deterministic agent outputs against defined success criteria.
  • Architectural Leadership: Leading the end-to-end design and architecture of high-throughput, stateful workflow orchestration systems capable of managing complex, multi-turn AI agent simulations at massive scale.
  • System Design & Integration: Defining robust, scalable API contracts and system boundaries bridging upstream telemetry data, asynchronous simulation engines, and secure remote execution environments.
  • Technical Strategy: Driving the technical roadmap for the Feedback Systems team, balancing long-term scalability and resilience with iterative product delivery.
  • Complex Problem Solving: Designing elegant solutions for hard distributed systems challenges, including rate limiting, backpressure, state management, and reliable execution of non-deterministic workflows.
  • Mentorship & Elevation: Mentoring senior engineers, leading cross-organizational architectural reviews, and establishing engineering best practices for code quality, testing, and system observability.
  • AI/ML Infrastructure Integration: Applying your practical experience with AI/ML platforms to design and implement the backend infrastructure that powers our evaluation engines, actively managing the complexities of integrating with LLMs, prompt routing, and non-deterministic agentic workflows.
  • Cross-Functional Influence: Acting as the strategic technical bridge across the Agentic AI organization, partnering closely with Product Managers and peer engineering leaders to translate complex product requirements, evaluation methodologies, and experimental needs into a scalable, future-proof architectural roadmap.
What You’ll Add to DigitalOcean:
  • Agentic Expertise: 5+ years of software engineering experience with deep proficiency in modern AI/ML frameworks, LLM orchestration (e.g., LangChain, AutoGen, CrewAI, or custom multi-agent frameworks), and production-grade Python and Go.
  • Behavioral Modeling & Persona Synthesis: Background in processing natural language data (e.g., historical user chat logs, support tickets) to algorithmically extract user intent, synthesize realistic personas, and generate deterministic goals for simulation.
  • Evaluation & "What-If" Benchmarking: Solid experience building evaluation frameworks for non-deterministic AI systems, including establishing metrics, guardrails, scoring rubrics, and regression testing methodologies for LLM configurations.
  • Data Fluency & Orchestration: Strong understanding of managing complex state in asynchronous architectures, streaming LLM tokens, handling rate limits, and manipulating heavy data pipelines to feed simulation engines.
  • Ownership & Pragmatism: A strong sense of technical ownership, a passion for balancing cutting-edge ML research with practical product delivery, and excellent communication skills to collaborate across a globally distributed team.
  • Extensive Experience: 10+ years of software engineering experience, with a proven track record operating at a Staff, Principal, or Architect level designing mission-critical distributed systems.
  • Distributed Systems Expertise: Expert-level understanding of designing highly concurrent, fault-tolerant, and globally scalable backend architectures.
  • Advanced Orchestration: Deep architectural experience with stateful, durable workflow orchestration engines and managing complex asynchronous lifecycles at scale.
  • API & Systems Integration: Extensive experience designing resilient, high-performance APIs (e.g., gRPC) and managing high-throughput message/event-driven architectures.
  • AI/ML Engineering Experience: While you do not need to be an ML researcher, you have demonstrable experience building, scaling, or integrating backend infrastructure for AI/ML products. This includes hands-on experience working with LLMs, agentic architectures, and solving the unique infrastructure challenges of testing non-deterministic systems.

*This job is located in Hyderabad, India

#LI-Hybrid

Why You’ll Like Working for DigitalOcean
  • We innovate with purpose. You’ll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud and AI so builders can spend more time creating software that changes the world. As a member of the team, you will be a Shark who thinks big, bold, and scrappy, like an owner with a bias for action and a powerful sense of responsibility for customers, products, employees, and decisions.
  • We prioritize career development. At DO, you’ll do the best work of your career. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that will always challenge you to think big. Our organizational development team will provide you with resources to ensure you keep growing. We provide employees with reimbursement for relevant conferences, training, and education. All employees have access to LinkedIn Learning's 10,000+ courses to support their continued growth and development.
  • We care about your well-being. Regardless of your location, we will provide you with a competitive array of benefits to support you from our Employee Assistance Program to Local Employee Meetups to flexible time off policy, to name a few. While the philosophy around our benefits is the same worldwide, specific benefits may vary based on local regulations and preferences.
  • We reward our employees. The salary range for this position is based on market data, relevant years of experience, and skills. You may qualify for a bonus in addition to base salary; bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire and the option to participate in our Employee Stock Purchase Program.
  • DigitalOcean is an equal-opportunity employer. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Application Limit: You may apply to a maximum of 3 positions within any 180-day period. This policy promotes better role-candidate matching and encourages thoughtful applications where your qualifications align most strongly.

Similar Jobs at DigitalOcean

3 Minutes Ago
In-Office
Senior level
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Drive end-to-end planning and delivery of Feedback Systems programs across engineering, product, and operations. Align stakeholders, mitigate risks, remove blockers, implement automation and reporting, and translate technical milestones into clear business/GTM outcomes to improve velocity and reduce operational overhead.
Top Skills: Ai/MlAsanaCloud InfrastructureDistributed SystemsJIRASaaSSmartsheet
3 Minutes Ago
In-Office
Entry level
Entry level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Provide personalized support for account, billing, and platform issues; educate customers on platform capabilities and best practices; triage and escalate incidents; create and update knowledge base content; identify recurring issues and drive product/process improvements while collaborating with the team.
Yesterday
In-Office
Senior level
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Own and operate DigitalOcean's billing control plane: triage escalated billing/support issues, build automation and revenue-assurance tooling, ensure platform reliability, collaborate across engineering, product, finance, and support, and support new product operational launches.
Top Skills: Distributed SystemsDockerGoGrpcKafkaKubernetesMicroservicesMySQLPythonRedisRubyShell Scripting

What you need to know about the Kolkata Tech Scene

When considering the industries shaping India's tech scene, gaming might not immediately come to mind. However, in the last decade, increased internet usage and greater access to mobile devices have catapulted the industry to new heights, with Kolkata-based companies like Virtualinfocom, Red Apple Technologies and Digitoonz, at the forefront, driving the design and animation of new gaming titles for players.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account