Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Entry Level
Qualifications
Proven experience with cloud platforms such as AWS, Azure, or Google Cloud. Strong understanding of networking, security, and infrastructure as code. Experience with DevOps practices and CI/CD pipelines. Proficiency in scripting languages such as Python, Bash, or PowerShell. Excellent problem-solving skills and ability to work collaboratively in a fast-paced environment.
About the job
We are seeking a talented Cloud Infrastructure Engineer to join our dynamic team at Mindlance. In this role, you will be responsible for designing, implementing, and maintaining scalable cloud infrastructures to support our robust applications and services. You'll work closely with cross-functional teams to ensure optimal performance and security of cloud resources.
About Mindlance
Mindlance is a leading staffing and consulting firm dedicated to providing innovative solutions and exceptional service. With a commitment to excellence, we strive to empower our clients with the best talent and technology in the industry.
Full-time|$180K/yr - $200K/yr|Remote|New York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States
About UsLightning AI, the innovative force behind PyTorch Lightning, is revolutionizing the AI landscape since 2019. We provide an all-encompassing platform designed to streamline the development, training, and deployment of AI systems, facilitating the transition from research to production effortlessly.Following our merger with Voltage Park, a cutting-edge…
Schmidt Sciences is a nonprofit organization established in 2024 by Eric and Wendy Schmidt, dedicated to accelerating scientific advancements and breakthroughs using cutting-edge tools to promote a sustainable planet. The organization emphasizes research in impactful fields such as AI and advanced computing, astrophysics, biosciences, climate science, and space exploration while also supporting a diverse array of researchers through its science systems program.About the AI & Advanced Computing Institute (“AI Institute”) The AI Institute at Schmidt Sciences operates as a grantmaking and research entity that recognizes AI as a pivotal driver for scientific discovery and societal advancement. Over the upcoming decade, we aim to empower leading researchers focused on enhancing AI systems to be competent, trustworthy, and reliable partners for scientists in groundbreaking discoveries. Our investments will target beneficial AI domains where philanthropy can make a significant difference. By fostering enabling infrastructure, fundamental research, and specialized scientific programs, the AI and Advanced Computing Institute will cultivate an environment where AI-enabled discoveries can thrive. The AI Institute currently concentrates on three core areas:AI for Science – Utilizing AI to enhance the scientific process, including hypothesis generation, experimental execution, data analysis, and knowledge production, all aimed at accelerating discovery. Key initiatives include exploring post-transistor hardware for AI and AI applications in scientific simulation.Science of AI – Investigating and managing AI systems to address potential risks associated with advanced AI. This includes improving AI reliability and performance in less commercially viable areas. Current programs comprise AI2050 and Science of Trustworthy AI, with explorations into AI interpretability and multi-agent communication evolution.Beneficial AI – Establishing scientific foundations and datasets to comprehend the broader societal impacts of AI. This encompasses high-impact grantmaking initiatives, such as leveraging AI to expedite humanities research (Humanities and AI Virtual Institute) and assessing AI's impact on the labor market (AI@Work). These initiatives are interconnected with other Schmidt Sciences efforts, like the Virtual Institute for Scientific Software (VISS), which aims to enhance the pace of scientific discovery through the development and support of high-quality, community-focused scientific software.
Full-time|$102K/yr - $145K/yr|On-site|New York, NY / Sunnyvale, CA / Bellevue, WA
CoreWeave builds cloud infrastructure for AI workloads, serving a range of clients from startups to global enterprises. Since 2017, the company has focused on delivering strong technical performance and support for innovators in the AI space. CoreWeave became publicly traded (Nasdaq: CRWV) in March 2025. More information is available at www.coreweave.com. Role Overview The Hardware Engineer - GPU & PCIe will join the Hardware Engineering team and report to the Hardware Engineering Manager. This role centers on designing, developing, troubleshooting, and optimizing server hardware, with a focus on GPU and PCIe systems. Collaboration with cross-functional teams, vendors, and stakeholders is essential to deliver reliable, high-performance hardware solutions. Key Responsibilities Troubleshoot complex GPU and PCIe failures. Work with external vendors on failure analysis. Track and monitor component RMAs. Develop and maintain hardware and firmware management services. Automate processes throughout the server hardware lifecycle. Serve as the senior escalation point for hardware troubleshooting. Partner with cross-functional teams to define hardware requirements, specifications, system architecture, and issue resolution playbooks. Create and update documentation for hardware designs, specifications, test procedures, and results. Analyze hardware system performance, identify bottlenecks, and recommend improvements. Establish processes for internal hardware testing, deployment, optimization, and troubleshooting. Locations This position is available in New York, NY, Sunnyvale, CA, or Bellevue, WA.
Full-time|$200K/yr - $300K/yr|On-site|New York, NY
About FluidstackAt Fluidstack, we are pioneering the infrastructure that powers advanced artificial intelligence. Collaborating with leading AI laboratories, government entities, and major corporations—including Mistral, Poolside, Black Forest Labs, and Meta—we aim to deliver compute capabilities at unparalleled speeds.Our mission is to expedite the realization of Artificial General Intelligence (AGI). Our team is driven by a sense of urgency and is dedicated to providing top-tier infrastructure. We view our clients' success as our own and take pride in the robust systems we create and the trust we cultivate. If you are inspired by meaningful work, strive for excellence, and are prepared to exert yourself to advance the future of intelligence, we invite you to join us in shaping what lies ahead.About the RoleIn the capacity of a System Engineer for our GPU Fleet, you will oversee, operate, and optimize our large-scale GPU compute infrastructure, which is essential for AI/ML training and inference processes. Your role will ensure the high availability, performance, and reliability of our GPU server fleet through automation, monitoring, troubleshooting, and collaboration with hardware engineering, platform teams, and data center operations.Key ResponsibilitiesMaintain and operate a vast GPU server fleet (H100, B200, GB200) catering to AI/ML workloads; continuously monitor system health, performance, and utilization to ensure maximum uptime and adherence to SLA.Conduct hands-on troubleshooting and root cause analysis for complex hardware, firmware, operating system, and application issues across GPU clusters; collaborate with vendors and hardware teams to rectify systemic failures.Create and sustain automation scripts for efficient provisioning, configuration management, monitoring, and remediation on a large scale.Enhance tools for GPU health assessments, performance diagnostics, driver validation, and automated recovery processes.Implement server provisioning, configuration, firmware updates, and OS installations utilizing automation frameworks; manage lifecycle operations encompassing deployment, maintenance, and decommissioning.Engage in 24x7 on-call rotation; respond to production incidents and coordinate resolution efforts with cross-functional teams, including data center operations, network engineering, and application teams.Lead post-incident reviews, document root causes, and spearhead continuous improvement initiatives focused on automation, reliability, monitoring, and operational efficiency.
Our MissionAt Reflection AI, our mission is to develop open superintelligence and make it available to everyone.We are creating open weight models that cater to individuals, agents, enterprises, and even nations. Our skilled team of AI researchers and innovators hails from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.About the RoleThe Compute Platform team at Reflection AI focuses on ensuring our compute layer is robust and highly available. Our K8s-based platform spans multiple neo-clouds, tackling complex systems challenges related to multi-cloud scheduling, node health, and performance debugging. You will collaborate closely with our training teams to design strategies for fault tolerance, health checks, and remediation processes.Key ResponsibilitiesCluster Management: Develop and maintain tools for automatic remediation, topology-aware scheduling, capacity planning, and expedited hardware debugging.Platform Engineering: Design and refine our cluster management stack to efficiently handle workloads across extensive multi-GPU fleets.Monitoring & Observability: Establish an all-encompassing monitoring system for the cluster, emphasizing durability and active performance benchmarking.Roadmap Execution: Prepare the infrastructure for next-gen GPU deployments and larger cluster sizes. In the long run, you will contribute to managing multi-cloud storage, petabyte-scale data replication, and optimizing GPU-to-GPU network performance.
numus is on the lookout for a talented and seasoned DevOps Infrastructure Engineer dedicated to overseeing and enhancing our cloud-based systematic trading infrastructure. In this role, you will develop robust tools that facilitate and expedite research efforts. The ideal candidate will possess significant experience in software development tools, production management, and cloud technology within a systematic trading context.Key Responsibilities:Create, manage, and maintain our production, research, and trading infrastructure across multiple cloud regions.Optimize low-latency systematic trading compute and network infrastructure.Support extensive research computing and modeling workloads with resilient, auto-scaling clusters and efficient hardware utilization.Troubleshoot and resolve technical issues, including server downtime, network connectivity problems, and cloud outages.Monitor system performance, proactively identifying and addressing performance bottlenecks, security vulnerabilities, and other issues that could impact trading operations.Collaborate closely with quantitative researchers, traders, developers, and other stakeholders to understand their needs, ensuring that the trading and research infrastructure meets their requirements.
Join CoreWeave as an Engineering Manager, leading our Data Infrastructure team. You will be at the forefront of designing, building, and scaling our data systems that support our innovative cloud solutions. This role is essential for driving efficiency and performance in our data handling, ensuring our infrastructure meets the demands of our growing customer base.
Role Overview Jump Trading LLC is hiring an Electrical Engineer focused on High-Performance Computing (HPC) Infrastructure. This position is based in Chicago or New York. The role centers on designing and maintaining electrical systems that power advanced trading technologies. What You Will Do Work with cross-functional teams to improve the performance of HPC electrical infrastructure. Ensure reliability and efficiency across all electrical systems supporting trading operations. Develop and implement new engineering solutions to meet evolving technology needs. Location Chicago or New York
Join Hudson River Trading (HRT) as a Software Engineer dedicated to enhancing GPU reliability within our innovative Systems Development team. Our team is responsible for building and maintaining the foundational platform utilized by all Systems teams to provision, monitor, and manage HRT’s expansive server and network infrastructure. In this pivotal role, you will focus on developing Python-based tools to analyze GPU hardware performance while crafting inventive solutions to boost observability, reliability, and efficiency across our GPU fleet. Collaborating closely with various engineering teams, you’ll gain insights into research and trading workflows to ensure optimal utilization of our GPU infrastructure.
About the Role:As the Compute Engineering Lead, you will address one of our most significant challenges: enhancing performance. Our users appreciate Hex's robust platform for complex data applications, yet they seek faster execution. You will spearhead initiatives to refine our compute architecture while preserving the flexibility and power that distinguishes Hex.Leading a team of experts, you will guide our performance investments, solidify our foundational architecture, and strive towards an ambitious vision where performance becomes a competitive edge in the market.What You Will Do:Design and implement an architecture tailored for multimodal workflows, ensuring it retains the beloved features of Hex while optimizing performance across diverse data scales.Identify and analyze performance bottlenecks in detail, leading the team to consistently achieve measurable enhancements.Establish data-driven metrics to set performance targets and track progress.Deliver significant projects incrementally with seamless customer rollouts, maintaining high execution standards.Create appropriate abstraction layers that empower teams to tackle performance issues effectively with clear guarantees.Integrate compute initiatives with cross-team plans to amplify the impact of new product development.Collaborate with field teams to leverage our unique architecture as a competitive advantage.Inspire confidence across the organization in our compute strategy and clearly communicate its importance.
About the RoleAt Counsel Health, we have developed an innovative product that has garnered admiration from both patients and physicians, utilizing a fast-paced technology stack. As we continue to grow, the complexity between our application code and infrastructure is expanding. We are seeking a talented individual who thrives in this dynamic intersection.We are in search of a Backend Infrastructure Engineer who embodies both software engineering and infrastructure expertise. This role is not about working in isolation writing Terraform scripts all day or solely delivering product features in React. Instead, you will work within the essential layer that connects our CI/CD pipelines, Infrastructure as Code (IaC) for our AWS environment, backend services, and the developer tools that empower our rapidly expanding engineering team.Your ResponsibilitiesEnhance Our Infrastructure-as-Code: Design and manage our cloud infrastructure utilizing Terraform.Optimize CI/CD Processes: Take ownership of our build, test, and deployment pipelines from start to finish. Aim to make production deployments effortless, speedy, reliable, and automated.Backend Development: Create production-level backend services and contribute to our core server-side systems. You are proficient in writing application code, not just orchestrating it.Create Core Archetypes: Develop production-grade archetypes for new deployments, jobs, consumers, and workers. You are adept at building an abstract base stream consumer or cron executor for widespread application.Enhance Developer Experience: Act as the engineer who accelerates the productivity of other engineers. Streamline local development workflows, decrease build times, resolve flaky infrastructure issues, and create internal tools that the team genuinely values.Security & Compliance: Collaborate closely with our security and compliance team to strengthen our infrastructure security posture, including secrets management, IAM policies, network isolation, and audit logging in a HIPAA-compliant environment.Who You AreA Bridge Builder: You feel at home in an IDE crafting application code as well as in a terminal managing cloud infrastructure. You view backend and infrastructure as intertwined disciplines.An Automation Advocate: If a task is done manually more than twice, you are already scripting it for automation.A Systems Thinker: You comprehend how a request traverses through various systems and how they interact.
Join 10alabs as an MLOps / Infrastructure Engineer, where you will play a crucial role in streamlining the deployment and management of machine learning models. Collaborate with cross-functional teams to build robust infrastructure that supports scalable and efficient AI solutions. Your expertise will help us enhance our platform and drive innovation.
Join arch.co as a Lead Infrastructure Engineer, where you will play a pivotal role in designing, implementing, and maintaining our cutting-edge infrastructure systems. Collaborate with cross-functional teams to ensure robust architecture and performance optimization, while mentoring junior engineers and providing technical leadership.
Full-time|$150K/yr - $250K/yr|On-site|New York, NY
Join Fluidstack: Pioneering the Future of InfrastructureAt Fluidstack, we are at the forefront of building the infrastructure for a new era of intelligence. Collaborating with leading AI labs, government entities, and top-tier enterprises such as Mistral, Poolside, Black Forest Labs, and Meta, we are dedicated to unlocking computational power at unprecedented speeds.Our mission to realize Artificial General Intelligence (AGI) fuels our urgency and commitment to excellence. We pride ourselves on building world-class infrastructure and treating our clients’ outcomes as our own. If you are driven by purpose, strive for excellence, and are ready to contribute to the acceleration of intelligence's future, we invite you to join our team in shaping what comes next.Role OverviewWe are looking for a seasoned Technical Program Manager specializing in IT Asset Management. This individual will oversee the comprehensive lifecycle management of our global distributed GPU infrastructure assets, focusing on optimizing utilization, managing procurement and deployment, and maximizing return on investment across our hardware fleet worldwide.Key ResponsibilitiesCraft and execute asset management strategies for GPU servers, networking devices, and data center infrastructure across our international locations.Oversee the full asset lifecycle from procurement to deployment, maintenance, returns, and decommissioning.Establish asset inventory systems and tracking processes, along with predictive models for hardware refresh cycles.Lead cross-functional initiatives aimed at optimizing asset utilization and lowering overall costs of ownership.Manage asset depreciation, financial modeling, and develop business cases for hardware investments.Negotiate vendor contracts and set service-level agreements (SLAs) for deployment, maintenance, and end-of-life processes.Oversee warehouse operations and logistics, including receiving, inspection, kitting, shipping, customs, and international freight.Collaborate with third-party logistics (3PL) providers to optimize shipping costs, transit times, and inventory accuracy.Develop robust return merchandise authorization (RMA) processes for defective hardware and manage warranty claims and repair depot operations.Monitor failure rates, mean time to repair (MTTR), and turnaround times to inform procurement and vendor selection.Implement secure decommissioning and data sanitization procedures in compliance with GDPR, CCPA, and other security policies.Coordinate the physical destruction of storage devices and maintain certificates of destruction while managing electronic waste (e-waste).
Role Overview BitGo is looking for a Senior Infrastructure Engineer in New York to help strengthen and expand the systems behind its crypto financial services. This role focuses on designing, building, and maintaining scalable infrastructure that supports the company's growth and reliability goals. What You Will Do Design and implement systems that scale with increasing demand Maintain existing infrastructure to ensure high reliability and performance Identify areas for improvement and contribute to better engineering practices About BitGo BitGo provides crypto financial services and relies on a strong engineering foundation to deliver secure, high-performing products.
Full-time|$180K/yr - $270K/yr|On-site|New York, NY
Join Fluidstack - Pioneers in AI InfrastructureAt Fluidstack, we are revolutionizing the landscape of artificial intelligence by constructing the backbone of advanced computing solutions. Collaborating with elite AI research facilities, governmental bodies, and industry giants such as Mistral, Poolside, and Meta, we are unlocking computational capabilities at unprecedented speeds.Our mission is to accelerate the realization of Artificial General Intelligence (AGI). Our team is driven, passionate, and dedicated to building infrastructure that meets the highest standards. We take ownership of our clients’ successes, pride ourselves on the innovative systems we create, and strive to earn their trust every day. If you are fueled by purpose, strive for excellence, and are ready to contribute to a transformative future, we invite you to join us.The Role - Electrical Design EngineerAs an Electrical Design Engineer at Fluidstack, you will be instrumental in designing resilient and cost-efficient electrical distribution systems that power our extensive GPU cloud infrastructure. You will ensure exceptional uptime for our AI and High-Performance Computing (HPC) clients who rely on our expertise for their most demanding tasks. Your role will involve articulating and justifying technical choices to senior leadership while driving innovations in our data center designs.We are seeking engineers with practical electrical design experience. If you possess the skills to design electrical systems, understand the critical requirements for high-density GPU data centers, and can assess the feasibility of various designs, you may be the ideal candidate. As we rapidly expand, we are looking to enhance our in-house engineering capabilities. You will take designs from concept through to permit and construction documentation. You will also define essential equipment specifications and review equipment submissions. In this role, you will support construction and be engaged throughout the entire project lifecycle—from site selection to commissioning and final handover.Key ResponsibilitiesDesign and develop electrical systems for our high-density GPU data centers, working collaboratively to produce comprehensive construction documents.Deliver designs that adhere to our high-quality standards while remaining within project budgets and timelines.Collaborate with commissioning teams to rigorously test, validate, and document the installation and performance of electrical systems prior to handover.Oversee multiple concurrent projects across Fluidstack's expanding footprint, effectively managing priorities across various regions.
About PolymarketPolymarket stands as the leading prediction market globally, where insights meet investment—part betting, part future forecasting.As we accelerate our growth—with over $6 billion traded this year alone—we aspire to be a trusted source of information in global media. Join us in our mission to revolutionize how the world perceives truth.About the RoleWe are seeking a top-tier Infrastructure Engineer to enhance our engineering platform. This pivotal role involves designing, building, and managing the foundational systems that empower our diverse product offerings. You will play a crucial part in establishing developer workflows, CI/CD pipelines, and robust infrastructure that supports high-performance APIs and data services. Your expertise will guide significant technical decisions, shape system architecture, and implement solutions that prioritize efficiency, reliability, and sustainability.What You’ll DoWe are in search of a seasoned individual contributor passionate about optimizing essential infrastructure.Build & Automate Development Workflows. You will design and maintain infrastructure that standardizes our software development, versioning, testing, and deployment processes across various environments.Architect Scalable Systems. You will make informed architectural decisions, weighing risks, performance, costs, and maintainability.Power High-Performance Services. You will enhance the infrastructure that supports real-time APIs and low-latency data feeds.Develop Testing Infrastructure. You will construct reliable tooling for automated testing, encompassing unit, integration, and API levels.Ensure Availability & Reliability. You will oversee critical infrastructure to minimize downtime and mitigate failure risks.Move Fast Without Breaking Things. You will deliver high-quality solutions efficiently, showcasing a strong sense of ownership and attention to detail.Operate Autonomously. You will take initiative and make independent decisions without micromanagement.Collaborate Across Teams. You will work closely with engineers across the organization to address deployment, performance, and operational requirements.What We’re Looking For6+ years of hands-on experience in infrastructure engineering or related fields.Proficiency in cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes).Strong knowledge of CI/CD tools and practices.Experience with database technologies and API design.Exceptional problem-solving skills and a collaborative mindset.
About KnotAt Knot, we are on a mission to revolutionize the way consumers and businesses interact through seamless merchant and banking experiences. Think of us as the 'Plaid for merchant connectivity.' Our innovative platform is designed to connect merchants with the multitude of applications that enhance everyday transactions. Our flagship product, CardSwitcher, empowers consumers to effortlessly update and manage their payment methods across various online merchant accounts like Netflix and PayPal. Additionally, our advanced solution, TransactionLink, allows for the retrieval of detailed transaction data, paving the way for new product development on our unique merchant connectivity platform. We invite you to join us in building these exciting new solutions!Founded in 2021 by brothers and Thiel Fellows Rory and Kieran O’Reilly, Knot currently facilitates connected online payment experiences for hundreds of thousands of users. Our technology is trusted by industry leaders like American Express, PayPal, Current, BILT, and Step, who integrate Knot’s SDK into their applications to deliver exceptional experiences to their customers.Backed by a distinguished group of investors including Nava Ventures, 8VC, and prominent figures from companies such as Twitter, Warby Parker, and DraftKings, Knot is well-positioned for continued growth and innovation.Working at KnotWe pride ourselves on having a world-class team from diverse backgrounds, with a strong emphasis on engineering talent. As we expand our footprint in NYC, we aim to be at the forefront of the financial services landscape.Our team is dedicated to building exceptional products for our users, balancing a serious approach to our work with a fun and engaging work environment. We believe both aspects are integral to our success.Your RoleDesign, architect, deploy, document, and oversee our cloud-based network infrastructure.Take ownership of critical API infrastructure that handles hundreds of requests per second.Lead technical decisions, providing justification for designs and coordinating with other teams to ensure alignment on values and requirements.Continuously enhance your knowledge of our infrastructure's long-term needs and capabilities.Manage and troubleshoot complex technical issues and incidents, providing support and solutions as necessary.
Join Crosby as an Infrastructure Engineer and play a crucial role in designing, implementing, and maintaining robust infrastructure systems. You will collaborate with cross-functional teams to ensure optimal performance, security, and scalability of our infrastructure.
We are seeking a talented Cloud Infrastructure Engineer to join our dynamic team at Mindlance. In this role, you will be responsible for designing, implementing, and maintaining scalable cloud infrastructures to support our robust applications and services. You'll work closely with cross-functional teams to ensure optimal performance and security of cloud resources.
May 22, 2017
Sign in to browse more jobs
Create account — see all 3,898 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.