Scientific Foundation Model Architect - Shape India's AI Future

The Mission

We are seeking an exceptional Foundation Model Architect to design and implement the core architecture for India's sovereign AI model. This role demands world-class expertise in transformer architectures, attention mechanisms, and large-scale model design that will directly impact model performance, training efficiency, and downstream applications.

STRATEGIC IMPORTANCE: The architecture decisions you make will define the capabilities and performance of India's first sovereign Scientific AI model, affecting millions of users and thousands of critical applications. Only candidates with demonstrable top-tier expertise across ALL specified domains will be considered.

You will collaborate closely with research scientists and engineers to translate cutting-edge research into robust, production-ready systems that serve strategic national interests while pushing the boundaries of AI capabilities.

Location: Bangalore, India (On-site required for high-security projects)
Experience: 5-10+ years in foundation model architecture
Reporting: Chief AI Officer

Core Responsibilities

Architecture Design

Design scientific foundation model architecture with 50B+ parameters
Define novel attention mechanisms for improved efficiency
Optimize memory footprint and inference speed
Design tokenizer strategy and scientific vocabulary
Implement advanced position encoding schemes

Research Implementation

Translate latest transformer research papers into production code
Prototype novel architectural improvements (MoE, GQA, MQA, MoR)
Conduct comprehensive ablation studies on design choices
Optimize architectures for both training and inference phases
Implement custom GPU kernels for critical operations

Multi-Modal Integration

Design unified architectures for text, image, and audio modalities
Implement cross-modal attention mechanisms
Create efficient fusion strategies for multi-modal inputs
Design modality-specific encoders and decoders
Optimize for cross-modal understanding and generation

High-Performance Computing

Design distributed training architectures using FSDP and DeepSpeed
Implement gradient accumulation and checkpointing strategies
Optimize for multi-node, multi-GPU training efficiency
Create memory-efficient attention mechanisms (Flash Attention)
Design pipeline parallelism for massive model training

Model Optimization

Implement model compression techniques (pruning, quantization)
Design efficient inference systems
Create dynamic batching and sequence optimization strategies
Implement KV-cache optimization for autoregressive generation
Design speculative decoding and parallel sampling methods

Technical Leadership

Lead architecture review meetings and design decisions
Mentor junior researchers and model engineers
Document architectural decisions and trade-off analyses
Collaborate with HPC team for infrastructure optimization
Drive technical standards and best practices across teams

Mandatory Technical Requirements

Educational Foundation

PhD/MS: Computer Science, AI/ML, or related technical field
Research Background: 3+ published LLM papers in top-tier ML conferences
Model Experience: Designed and trained models with 10M+ parameters
Industry Impact: Contributed to production-grade AI systems

Architectural Expertise

              class ArchitecturalRequirements {
              public:
                  // Transformer Variants
                  struct ModelExpertise {
                      bool gpt_family = true;        // GPT-3/4, PaLM, LLaMA
                      bool bert_family = true;       // BERT, RoBERTa, DeBERTa
                      bool t5_family = true;         // T5, UL2, PaLM-2
                      bool multimodal = true;        // CLIP, DALL-E, Flamingo
                  };
                  
                  // Attention Mechanisms
                  struct AttentionExpertise {
                      bool flash_attention = true;   // Memory-efficient attention
                      bool sparse_attention = true;  // Longformer, BigBird
                      bool linear_attention = true;  // Performer, Linear Transformer
                      bool gqa_mqa = true;           // Grouped/Multi-Query Attention
                  };
                  
                  // Advanced Techniques
                  struct AdvancedMethods {
                      bool mixture_of_experts = true; // Switch, GLaM, PaLM
                      bool position_encoding = true;  // RoPE, ALiBi, NoPE
                      bool model_parallelism = true;  // Tensor/Pipeline parallel
                  };
              };
                  

Implementation Skills

Deep Learning Frameworks: PyTorch (expert), JAX, TensorFlow
High-Performance Computing: CUDA kernels, Triton, custom operators
Distributed Systems: FSDP, DeepSpeed, Megatron-LM, FairScale
Optimization Libraries: Flash Attention, xFormers, TensorRT

Essential Experience

5-10+ years in machine learning and model architecture
Production deployment of large-scale transformer models
Open-source contributions to major ML frameworks
Track record of architecting systems handling billions of parameters

Technology Stack

Transformers PyTorch CUDA Flash Attention RoPE/ALiBi MoE GQA/MQA FSDP DeepSpeed TensorRT Triton Custom CUDA Kernels Megatron-LM JAX xFormers FairScale

What We Offer

Cutting-Edge Infrastructure

Access to world-class GPU clusters, latest H100/H200 hardware, and state-of-the-art development environments for pushing the boundaries of AI architecture.

Elite Compensation Package

Industry-leading salary competitive with top global AI companies, equity participation, performance bonuses, and comprehensive benefits package.

Research Excellence

Opportunity to publish groundbreaking research, attend top conferences, collaborate with world-class scientists, and contribute to open-source AI frameworks.

Strategic National Impact

Direct contribution to India's AI sovereignty while architecting systems that will define the future of artificial intelligence and shape technological independence.

Performance Expectations

50B+

Parameters

10K+

GPU Hours/Week

100x

Efficiency Gains

32K+

Context Length

Candidate Profile

You are among the world's leading experts in foundation model architecture, with deep expertise spanning transformer design, attention mechanisms, and large-scale model optimization. Your work has advanced the state-of-the-art in AI model architecture, and you have a proven track record of designing production systems that serve millions of users.

Non-Negotiable Standards: You must demonstrate exceptional mastery in ALL architectural domains listed above. This role requires someone who can seamlessly transition between theoretical research, system design, and high-performance implementation while maintaining the highest standards of technical excellence.

You thrive on solving complex architectural challenges and understand that your work will directly impact India's technological sovereignty. You are passionate about advancing both AI research and practical applications that serve strategic national interests.

Develop the Architecture of India's AI Future

Join India's most ambitious AI initiative and architect the scientific foundation model that will define our nation's technological independence.

Selection Process:

Architecture assessment (120 min) → System design challenge (take-home) → Technical deep-dive (150 min) → Research presentation (90 min) → Final evaluation with leadership

Note: Only candidates demonstrating world-class expertise in foundation model architecture will advance to final rounds.