Logging, Monitoring, and Observability in Google Cloud (LMOGC)

Course Details

Online Training

Duration : 3 day

Who should attend

This class is intended for the following participants:

    • Cloud architects, administrators, and SysOps personnel
    • Cloud developers and DevOps personnel

Prerequisites

To get the most out of this course, participants should have:

    • Google Cloud Fundamentals: Core Infrastructure (GCF-CI) or equivalent experience
    • Basic scripting or coding familiarity
    • Proficiency with command-line tools and Linux operating system environments

Course Objectives

This course teaches participants the following skills:

    • Plan and implement a well-architected logging and monitoring infrastructure
    • Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
    • Create effective monitoring dashboards and alerts
    • Monitor, troubleshoot, and improve Google Cloud infrastructure
    • Analyze and export Google Cloud audit logs
    • Find production code defects, identify bottlenecks, and improve performance
    • Optimize monitoring costs

Course Content

Module 1 – Introduction to Google Cloud Monitoring Tools

    • Understand the purpose and capabilities of Google Cloud operations-focused components: Logging, Monitoring, Error Reporting, and Service Monitoring
    • Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler

Module 2 – Avoiding Customer Pain

    • Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation
    • Measure customer pain with SLIs
    • Define critical performance measures
    • Create and use SLOs and SLAs
    • Achieve developer and operation harmony with error budgets

Module 3 – Monitoring Critical Systems

    • Choose best practice monitoring project architectures
    • Differentiate Cloud IAM roles for monitoring
    • Use the default dashboards appropriately
    • Build custom dashboards to show resource consumption and application load
    • Define uptime checks to track aliveness and latency

Module 4 – Alerting Policies

    • Develop alerting strategies
    • Define alerting policies
    • Add notification channels
    • Identify types of alerts and common uses for each
    • Construct and alert on resource groups
    • Manage alerting policies programmatically

Module 5 – Advanced Logging and Analysis

    • Identify and choose among resource tagging approaches
    • Define log sinks (inclusion filters) and exclusion filters
    • Create metrics based on logs
    • Define custom metrics
    • Link application errors to Logging using Error Reporting
    • Export logs to BigQuery

Module 6 – Working with Audit Logs

    • Audit Logs
    • Data Access Logging
    • Audit Logs Entry Format
    • Best Practices

Module 7 – Configuring Google Cloud Services for Observability

    • Integrate logging and monitoring agents into Compute Engine VMs and images
    • Enable and utilize Kubernetes Monitoring
    • Extend and clarify Kubernetes monitoring with Prometheus
    • Expose custom metrics through code, and with the help of OpenCensus

Module 8 – Monitoring Google Cloud VPC

    • Collect and analyze VPC Flow logs and Firewall Rules logs
    • Enable and monitor Packet Mirroring
    • Explain the capabilities of Network Intelligence Center
    • Use Admin Activity audit logs to track changes to the configuration or metadata of resources
    • Use Data Access audit logs to track accesses or changes to user-provided resource data
    • Use System Event audit logs to track GCP administrative actions

Module 9 – Managing Incidents

    • Define incident management roles and communication channels
    • Mitigate incident impact
    • Troubleshoot root causes
    • Resolve incidents
    • Document incidents in a post-mortem process

Module 10- Investigating Application Performance Issues

    • Debug production code to correct code defects
    • Trace latency through layers of service interaction to eliminate performance bottlenecks
    • Profile and identify resource-intensive functions in an application

Module 11- Optimizing the Costs of Monitoring

    • Analyze resource utilization cust for monitoring related components within Google Cloud
    • Implement best practices for controlling the cost of monitoring within Google Cloud