SW Development Manager, Trainium Manufacturing, Quality and Reliability

Amazon

  • Austin, TX
  • Permanent
  • Full-time
  • 21 hours ago
DESCRIPTIONAnnapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago-even yesterday.Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.
In Annapurna Labs we are at the forefront of hardware/software co-design not just in Amazon Web Services (AWS) but across the industry. The Trainium Manufacturing, Quality and Reliability (MQR) Team is part of AWS Annapurna Labs responsible for the end to end manufacturing and deployment of these cutting edge AI products and system designs for the world's largest Cloud Services provider. The MQR team is looking for candidates interested in leading the manufacturing test team responsible for developing and deploying manufacturing test FW/SW content to our global manufacturing lines. The scope of this role includes working closely with the HW design teams to identify, define and develop test content while building scalable deployment, data automation and diagnostic mechanisms to ensure high efficiency operations across our global manufacturing partner sites.You'll provide leadership in the application of new technologies to large scale deployments in a continuous effort to deliver a world-class customer experience. This is a fast-paced, intellectually challenging position, and you'll work with thought-leaders in multiple technology areas. You'll have high standards for yourself and everyone you work with, and you'll be constantly looking for ways to improve our products' performance, quality and cost. We're changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.Key job responsibilities
As a Test Development manager, you are responsible for working with the Lead Manufacturing Test Engineer to identify key test coverage gaps based on manufacturing and fleet performance for the current and next generation Machine Learning Acceleration (MLA) product family. You will lead the team to deliver solutions addressing these needs. Key responsibilities include
  • Scale and manage a team of manufacturing test and data automation engineers
  • Drive test coverage improvement strategies
  • Develop manufacturing validation methodologies and infrastructures
  • Collaborate with Manufacturing engineering, Quality and Reliability, HW design, Fleet Operations and Infrastructure teams teams to ensure delivery of high quality systems to ODM/CM manufacturing sites and AWS data centers
  • Own test deployment schedules and periodic reporting of key performance indicators
  • Closely monitor global high volume manufacturing sites/vendors
  • Collaborate with global teams to provide 24x7 operations
  • Provide technical mentoring for the team.
About the team
AWS is the world's leading and most trusted provider of virtualized public cloud utility services. We offer our global IT customer base who span private, corporate and government sectors, over 100 fully featured, integrated services in Gen AI, compute, storage, database, analytics, mobile, Internet of Things (IOT) and enterprise applications. AWS operates a worldwide fleet of interconnected enterprise data centers at hyperscale to deliver the capacity that powers our customers IT infrastructure which enables their ability to concentrate on core competencies through agility and operational efficiency. To learn more about AWS, visit https://aws.amazon.comAWS Utility Computing (UC) provides product innovations - from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs is a wholly owned subsidiary of AWS, focused on developing custom silicon and servers including the Nitro, Graviton, Inferentia, and Trainium families of processors.
Machine Learning Annapurna (MLA) functions as a vertically integrated team including software, firmware, hardware, and silicon design in a single organization. We are the Trainium Servers and Systems organization under MLA focused on Hardware Development, Software Development, Fleet Ops Systems, and Manufacturing, Quality, and Reliability. This position is in the Manufacturing, Quality and Reliability team.Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and leadership development. We care about your career growth and strive to assign projects that help our team members develop your leadership and technical expertise so you feel empowered to take on more complex tasks in the future.BASIC QUALIFICATIONS- 3+ years of engineering team management experience
- 7+ years of working directly within engineering teams experience
- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations
- Experience partnering with product or program management teams
- 3+ years experience working with high volume manufacturing environments
- Bachelor's or Master's degree in electrical engineering, computer engineering, computer science or equivalentPREFERRED QUALIFICATIONS- Experience in communicating with users, other technical teams, and senior leadership to collect requirements, describe software product features, technical designs, and product strategy
- Experience in recruiting, hiring, mentoring/coaching and managing teams of Software and Hardware Engineers to improve their skills, and make them more effective
- Experience with statistical yield analysis
- Experience working in high-volume manufacturing environments, particularly with ODM/CM partners
- Experience implementing and maintaining test solutions at scale
- Experience with ML/AI hardware systems and accelerators
- Familiarity with server and data center environments
- Knowledge of hardware/software co-design and validation techniques
- Experience with continuous integration and test automation frameworksAmazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $166,400/year in our lowest geographic market up to $287,700/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit . This position will remain posted until filled. Applicants should apply via our internal or external career site.

Amazon