Complete YAML Tutorial

Table of Contents

What is YAML?

YAML (YAML Ain't Markup Language) is a human-friendly data serialization standard for all programming languages. It's often used for configuration files, inter-process messaging, and data persistence. Its focus on readability makes it popular for many DevOps and automation tools like Kubernetes and Ansible.

Basic Syntax

YAML relies on indentation (spaces, not tabs!) to define structure. It's case-sensitive.

Key-Value Pairs (Mappings/Dictionaries)

The most fundamental building block, representing a simple association between a key and a value.

# Basic key-value pair
name: John Doe

# String values don't always need quotes
message: Hello, world!
number: 123
boolean: true

Lists (Sequences/Arrays)

Represented by hyphens followed by a space for each item.

# A list of strings
fruits:
  - Apple
  - Banana
  - Orange

# A list of numbers
prime_numbers:
  - 2
  - 3
  - 5
  - 7

# A list of mappings
people:
  - name: Alice
    age: 30
  - name: Bob
    age: 25

Indentation

Indentation defines the hierarchy. Each level of indentation indicates a nested structure.

# Nested structure
company:
  name: Acme Corp
  location: New York
  employees:
    - name: Alice
      role: Developer
    - name: Bob
      role: Manager

Comments

Comments start with a hash symbol (#) and are ignored by YAML parsers.

# This is a single-line comment
key: value # Inline comment

another_key:
  # This is a comment within a block
  nested_key: nested_value

Data Types

YAML supports various data types, often inferred automatically.

Strings

Can be plain or quoted. Quoting is necessary for strings that contain special characters or might be misinterpreted as other data types (e.g., numbers, booleans).

plain_string: This is a plain string.
quoted_string: "This string has spaces and special characters: #!@"
boolean_as_string: "Yes" # Quoted to avoid being parsed as boolean
number_as_string: "123"   # Quoted to avoid being parsed as integer
multiline_string_folded: >
  This is a long string that will be
  folded into a single line.

multiline_string_literal: |
  This is a
  literal block string.
  Each line break is preserved.

Numbers (Integers, Floats)

integer: 100
negative_integer: -50
float: 3.14
scientific_notation: 1.2e+5

Booleans

Represented by true/false, on/off, yes/no (case-insensitive).

is_active: true
allow_access: no
feature_enabled: ON

Null

Represented by null or ~.

empty_value: null
another_empty: ~

Dates and Times

YAML can parse ISO 8601 formatted dates and times.

date: 2023-10-27
datetime: 2023-10-27T10:30:00Z
local_datetime: 2023-10-27 10:30:00 -05:00

Advanced Features

Anchors and Aliases (Reusability)

Allows you to define a block of content once and reuse it in multiple places, promoting DRY (Don't Repeat Yourself) principles.

# Define an anchor for a common address
default_address: &addr
  street: 123 Main St
  city: Anytown
  zip: "12345"

user1:
  name: Alice
  address: *addr # Use the alias

user2:
  name: Bob
  address:
    <<: *addr # Merge the default address
    apt: 4B # Add a specific detail

Tags (Type Explicitly)

You can explicitly specify a data type using tags, though it's rarely necessary as parsers usually infer correctly.

# Explicitly define string
price: !!str 123.45

# Explicitly define timestamp
start_time: !!timestamp 2023-01-01 12:00:00

# Custom tag (requires parser support)
# employee: !person
#   name: Carol
#   id: 789

Block Styles for Collections

Beyond the common "block" style (indentation), YAML offers "flow" styles for more compact representations.

# Block style mapping
person:
  name: David
  age: 40

# Flow style mapping (looks like JSON)
person_flow: {name: Emily, age: 35}

# Block style sequence
colors:
  - Red
  - Green
  - Blue

# Flow style sequence
colors_flow: [Yellow, Purple, Orange]

Common Pitfalls

1. Indentation Errors

YAML is extremely sensitive to whitespace. Use spaces, not tabs, and ensure consistent indentation levels.

# Incorrect (using tabs or inconsistent spaces)
key:
	nested_key: value # Tab instead of spaces

# Correct
key:
  nested_key: value

2. Missing Colons or Hyphens

Forgetting these symbols can lead to parsing errors.

# Incorrect
name John Doe # Missing colon

# Correct
name: John Doe

# Incorrect
fruits
  Apple # Missing hyphen

# Correct
fruits:
  - Apple

3. Unquoted Special Strings

Strings that look like numbers, booleans, or contain special characters (:, -, {, [, #, &, *, !, |, >, ', ", %, @, `) should be quoted.

# Problem: "Yes" might be parsed as a boolean
answer: Yes

# Solution: Quote the string
answer: "Yes"

# Problem: Looks like a number
id: 007

# Solution: Quote the string
id: "007"

Real-World Examples

Kubernetes Deployment

YAML is the primary language for defining Kubernetes resources.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

Ansible Playbook

Ansible uses YAML for its playbooks, which define automation tasks.

---
- name: Install Nginx and deploy website
  hosts: webservers
  become: yes
  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes

    - name: Install Nginx
      apt:
        name: nginx
        state: present

    - name: Copy index.html
      copy:
        src: /path/to/local/index.html
        dest: /var/www/html/index.html
        mode: '0644'

    - name: Ensure Nginx is running and enabled
      systemd:
        name: nginx
        state: started
        enabled: yes

Docker Compose

Docker Compose uses YAML to define and run multi-container Docker applications.

version: '3.8'
services:
  web:
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - app
  app:
    build: .
    ports:
      - "5000:5000"
    environment:
      FLASK_ENV: production
      DATABASE_URL: postgres://user:password@db:5432/mydb
  db:
    image: postgres:13
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    volumes:
      - db_data:/var/lib/postgresql/data
volumes:
  db_data:

YAML vs. JSON

YAML is often considered a superset of JSON, meaning most JSON is valid YAML. However, YAML aims for greater human readability.

JSON Example:

{
  "name": "Jane",
  "age": 28,
  "hobbies": ["reading", "hiking"],
  "address": {
    "street": "456 Oak Ave",
    ""city"": "Springfield"
  }
}

Equivalent YAML Example:

name: Jane
age: 28
hobbies:
  - reading
  - hiking
address:
  street: 456 Oak Ave
  city: Springfield

YAML offers features like comments, anchors, and a more concise syntax (no curly braces or square brackets for mapping/sequence boundaries when using block style) that JSON lacks.

Tools and Validation

Several tools can help you write and validate YAML.

Note: YAML is a powerful and flexible data serialization language. While its human readability is a key advantage, mastering its indentation rules and special characters is crucial to avoid common errors. Always validate your YAML files, especially in production environments.