How to Set Up Prometheus and Grafana
Prometheus and Grafana together form the most widely deployed open source monitoring stack for cloud-native and traditional infrastructure. Prometheus handles metrics collection and storage using its pull-based scraping model, while Grafana provides the visualization, dashboarding, and alerting interface. Both tools are mature, well-documented, and backed by active communities. This guide covers a Linux-based deployment, though the same principles apply to container-based and Kubernetes deployments with minor adjustments.
Plan Your Monitoring Architecture
Before installing anything, decide what you need to monitor and how the components will communicate. For a small environment with fewer than 50 servers, a single Prometheus instance can handle collection, storage, and alerting. For larger environments, plan for a distributed architecture with multiple Prometheus servers, each responsible for a subset of targets, and a long-term storage backend like Thanos or Cortex.
Identify the targets you want to monitor. Linux servers need Node Exporter for hardware and OS metrics. Databases like PostgreSQL, MySQL, and MongoDB each have dedicated exporters. Container platforms expose metrics natively, as Kubernetes provides built-in endpoints for node and pod metrics. Web servers, message brokers, and application frameworks often have Prometheus client libraries or community exporters available.
System requirements for Prometheus are modest for small deployments. A server with 2 CPU cores, 4 GB of RAM, and 50 GB of SSD storage can comfortably handle monitoring a few dozen hosts with two-week data retention. Memory consumption scales with the number of active time series rather than the number of monitored hosts, so environments with high-cardinality labels (such as per-request metrics) will need more RAM. Grafana requires minimal resources, typically 1 CPU core and 512 MB of RAM for small installations.
Install and Configure Prometheus
Download the latest Prometheus release from the official GitHub repository. Extract the archive and move the prometheus and promtool binaries to /usr/local/bin/. Create a dedicated system user for Prometheus to run under, with no login shell and no home directory, following the principle of least privilege. Create the configuration directory at /etc/prometheus/ and the data directory at /var/lib/prometheus/, setting ownership to the Prometheus user.
The Prometheus configuration file, prometheus.yml, defines global settings, scrape intervals, and the list of targets to monitor. The global scrape_interval determines how frequently Prometheus polls its targets, with 15 seconds being a common default. Each scrape_config block defines a job name and a list of static or dynamically discovered targets. Start with a job that scrapes Prometheus's own metrics endpoint at localhost:9090, which verifies the installation is working correctly.
Create a systemd service file for Prometheus that starts the server with appropriate flags, including the path to the configuration file, the data storage directory, and the retention period. A retention time of 15 to 30 days is typical for local storage, with longer-term data delegated to an external storage backend if needed. Start and enable the service, then verify Prometheus is running by accessing the web interface at port 9090. The Status > Targets page shows whether configured scrape targets are being reached successfully.
Deploy Node Exporter on Monitored Hosts
Node Exporter is the standard Prometheus exporter for Linux server metrics. It exposes CPU utilization, memory usage, disk I/O, network traffic, filesystem capacity, load averages, and dozens of other system-level metrics through an HTTP endpoint on port 9100. Install Node Exporter on every Linux server you want to monitor.
Download the Node Exporter binary, move it to /usr/local/bin/, and create a systemd service file that runs it as a dedicated system user. The default configuration exposes a comprehensive set of system metrics without any additional flags. Start the service and verify it is working by requesting the /metrics endpoint on port 9100, which should return a plain-text list of metric names and values.
Back on the Prometheus server, add a new scrape_config job for Node Exporter targets. List each monitored host by IP address or hostname with port 9100. For environments managed by configuration management tools like Ansible, Puppet, or Salt, generate the target list dynamically from inventory data. Prometheus also supports file-based service discovery, where targets are defined in a JSON or YAML file that Prometheus watches for changes, allowing target updates without restarting the Prometheus service. After adding targets, reload the Prometheus configuration and verify the new targets appear as "UP" on the Status > Targets page.
Install and Connect Grafana
Install Grafana from the official repository for your Linux distribution. Grafana packages are available for Debian/Ubuntu (apt), RHEL/CentOS (yum/dnf), and SUSE, as well as standalone binary downloads. Start and enable the Grafana systemd service, then access the web interface at port 3000. The default login credentials are admin/admin, and Grafana will prompt you to change the password on first login.
Add Prometheus as a data source through the Grafana web interface under Configuration > Data Sources > Add Data Source. Select Prometheus, enter the URL (typically http://localhost:9090 if Grafana and Prometheus run on the same host), and click Save & Test to verify connectivity. Once the data source is configured, Grafana can query Prometheus using PromQL and render the results in dashboards.
Import community dashboards to get immediate value from your monitoring setup. The Grafana dashboard library at grafana.com/grafana/dashboards contains thousands of pre-built dashboards for common monitoring scenarios. Dashboard 1860 (Node Exporter Full) is the most popular dashboard for Linux server monitoring, providing comprehensive visualizations for CPU, memory, disk, network, and system metrics collected by Node Exporter. Import it using its dashboard ID, select your Prometheus data source, and you will have a fully functional server monitoring dashboard within seconds.
Grafana's alerting system evaluates rules against data source queries and delivers notifications through configured contact points. Create alert rules that trigger when metrics exceed thresholds, such as disk usage above 85% or CPU load average above the number of cores for more than 5 minutes. Configure contact points for your preferred notification channels, whether email, Slack, PagerDuty, Microsoft Teams, or webhook-based integrations. Alert rules can be organized into folders with notification policies that route alerts to specific contact points based on labels.
Configure Alertmanager
While Grafana provides its own alerting, the Prometheus ecosystem includes Alertmanager as a dedicated alert routing and notification component. Alertmanager receives alerts from Prometheus, deduplicates them, groups related alerts together, applies silencing rules during maintenance windows, and delivers notifications through configured receivers. For production deployments, running Alertmanager alongside Grafana alerting provides defense in depth, ensuring alerts are delivered even if Grafana is temporarily unavailable.
Install Alertmanager from the official release, create its configuration file at /etc/alertmanager/alertmanager.yml, and define receivers for your notification channels. A receiver specifies where alerts should be sent, such as an email address, a Slack webhook URL, or a PagerDuty integration key. The routing configuration determines which receiver handles which alerts based on alert labels. A common pattern is to route critical alerts to PagerDuty for immediate paging while sending warning-level alerts to a Slack channel for triage during business hours.
Define alerting rules in Prometheus by creating rule files referenced in the prometheus.yml configuration. Each rule specifies a PromQL expression that, when true for a specified duration, fires an alert to Alertmanager. For example, a rule might fire when a host's available memory drops below 10% for more than 5 minutes, or when a disk partition exceeds 90% utilization. Test your alerting pipeline by intentionally triggering an alert condition and verifying that notifications arrive through your configured channels.
Secure and Harden the Stack
By default, Prometheus, Node Exporter, Grafana, and Alertmanager all serve HTTP without encryption or authentication. For production deployments, place a reverse proxy like Nginx or Caddy in front of these services to provide TLS encryption and access control. Caddy is particularly convenient because it handles automatic certificate management through Let's Encrypt without additional configuration.
Restrict network access to monitoring ports using firewall rules. Node Exporter (port 9100) should only be accessible from the Prometheus server, not from the public internet. Prometheus (port 9090) and Alertmanager (port 9093) should be accessible only from the Grafana server and authorized administrative networks. Grafana (port 3000) is the only component that typically needs broader network access, and it should be served behind the reverse proxy on port 443 with TLS.
Back up your Prometheus data directory and configuration files regularly. While Prometheus data can be regenerated by re-scraping targets, historical data is valuable for trend analysis and capacity planning. Grafana's SQLite database (or PostgreSQL/MySQL if configured) contains dashboard definitions, user accounts, and alert configurations that should be backed up. Store backups off-server, ideally in a separate location from the monitoring infrastructure, so they survive the same failures the monitoring system is designed to detect.
A functional Prometheus and Grafana stack can be running within an hour for a small environment. The combination of Prometheus's reliable metrics collection, Grafana's flexible visualization, and community-maintained exporters and dashboards provides monitoring coverage that rivals commercial platforms at a fraction of the cost.