Staff Software Engineer, Compute
Create a free account to apply in seconds
You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.
Key responsibilities
• Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
• Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
• Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
• Profile and tune low level CPU and memory performance
Requirements
• 5+ years experience building distributed compute and orchestration platforms in Python or Rust
• Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
• Deep understanding of computational complexity and memory allocation
• Track record of designing systems that scale under real production load
• Experience building and using observability to drive performance and reliability decisions
• Excellent communication and ability to drive technical decisions across teams
• Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to have
• Experience with AI/ML inference or training infrastructure
• Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
• Background in building multi-tenant compute platforms
• Understanding of networking fundamentals and performance characteristics
• Familiarity with GPU workload characteristics and scheduling constraints
Location
• Turkey
What we offer at fal
• Interesting and challenging work
• A lot of learning and growth opportunities
• Regular team events and offsites