Key Responsibilities
Infrastructure & Environment Management
Manage the full lifecycle of environments (Dev / QA / Staging / Production) across cloud platforms
Design, provision, and maintain infrastructure primarily based on Kubernetes
perate and manage Kubernetes clusters and containerized workloads in production
Ensure systems are scalable, highly available, and cost-efficient
Deployment, Automation & GitOps
Implement and manage GitOps-based deployments using ArgoCD or FluxCD on Kubernetes
Design and maintain CI/CD workflows aligned with GitOps practices
Build and manage infrastructure using Terraform and/or Ansible
Automate operational processes to reduce manual intervention
Develop internal tools and scripts using Go, Python, or TypeScript
Integrate systems with external services (GitHub/GitLab, Slack, webhooks, APIs)
Ensure consistent, repeatable, and auditable deployments across environments
Monitoring, Logging & Reliability
Implement and maintain observability using Prometheus, Grafana, InfluxDB
Define alerting and monitoring strategies to ensure system health
Analyze metrics and logs to proactively identify issues and performance bottlenecks
Continuously improve system reliability and availability
Troubleshooting & Incident Management
Investigate and resolve production issues in Kubernetes-based distributed systems
Analyze logs and metrics to identify root causes
Work with engineering teams to implement fixes and long-term improvements
Participate in incident reviews and improve system resilience
Performance & Testing
Design and execute load and stress testing scenarios
Support creation of test environments and mock services when needed
Analyze performance results and recommend optimizations
Architecture & Delivery Support
Collaborate with developers and architects on system design (Kubernetes-native architectures)
Provide input on scalability, high availability, and resilience
Support project delivery by defining infrastructure and release requirements
Work with PMs to ensure smooth and timely releases
Documentation & Collaboration
Maintain clear documentation (runbooks, procedures, architecture diagrams)
Support knowledge sharing within the team and with customers
Participate in technical discussions with internal teams and external stakeholders
-----------------------------------------
Required Skills & Experience
Core Requirements
Strong hands-on Kubernetes experience in production environments (mandatory)
Proven experience with GitOps tools (ArgoCD or FluxCD)
Strong experience with Infrastructure as Code: Terraform (preferred) and/or Ansible
Solid Linux system administration skills
Experience working with at least one major cloud provider (AWS / Azure / GCP / Alibaba Cloud)
Programming / Automation
Hands-on coding experience in at least one: Go / Python / TypeScript
Ability to build tools, automation, and integrations (not just basic scripting)
Comfortable with a 'vibe coding' mindset pragmatic, iterative development to solve real operational problems quickly
Observability
Experience with Prometheus, Grafana, InfluxDB
Strong understanding of monitoring, alerting, and system reliability
Nice to Have
Experience with multi-cloud environments
Experience operating high-scale distributed systems
Profile
Senior-level experience in DevOps / SRE roles
Strong ownership and problem-solving mindset
Comfortable working across teams (Dev, QA, PM, Customers)
Strong troubleshooting and system-level thinking
关于我们
Wiredcraft 创办于 2009 年,2022 年 6 月正式加入阳狮集团中国( Publicis Groupe China )。目前在上海、新加坡设有办公室,服务范围辐射中国和亚太地区。我们拥有逾 100 名本土和国际专业人士,深耕技术、设计、研发、产品管理、战略咨询以及数据领域,助力企业实现全渠道、跨国界的数字化转型战略蓝图。
如何申请上述岗位:
(可选择以下任意一种方式进行申请)
1.官网投递:
https://wiredcraft.com/expertise/通过上述对应官网链接投递,或者访问我们的官方招聘平台,了解更多招聘资讯。
2.Email:
[email protected](可发送你的中英文简历到上述邮箱,标题格式:申请岗位|姓名|来自 V2EX )