Hadoop HDFS 详解

原文

HDFS 架构

HDFS是Hadoop应用中一个最主要的分布式存储系统。一个HDFS集群主要由一个 NameNode ,一个Secondary NameNode 和很多个 Datanode 组成:Namenode管理文件系统的元数据,而Datanode存储了实际的数据。客户端通过Namenode以获取文件的元数据或修饰属性,而真正的文件I/O操作是直接和Datanode进行交互的。

本文将详细介绍HDFS集群中的各个角色的作用以及工作原理、一些重要的特性。下面列出的是HDFS中常用特性的一部分:

  • 机架感知(Rack awareness):在调度任务和分配存储空间时考虑节点的物理位置。
  • 安全模式:一种维护需要的管理模式。
  • fsck:一个诊断文件系统健康状况的工具,能够发现丢失的文件或数据块。
  • Rebalancer:当datanode之间数据不均衡时,平衡集群上的数据负载。
  • 升级和回滚:在软件更新后有异常发生的情形下,能够回滚到HDFS升级之前的状态。
  • Secondary Namenode:对文件系统名字空间执行周期性的检查点,将Namenode上HDFS改动日志文件的大小控制在某个特定的限度下。

HDFS优点:

(1) 适合大数据处理(支持GB,TB,PB级别的数据存储,支持百万规模以上的文件数量)

(2) 适合批处理(支持离线的批量数据处理,支持高吞吐率)

(3) 高容错性(以数据块存储,可以保存多个副本,容易实现负载均衡)

HDFS缺点:

(1) 小文件存取(占用namenode大量内存,浪费磁盘空间)

(2) 不支持并发写入(同一时刻只能有一个进程写入,不支持随机修改)

HDFS中的角色

文件系统的名字空间 (namespace)

HDFS支持传统的层次型文件组织结构。用户或者应用程序可以创建目录,然后将文件保存在这些目录里。文件系统名字空间的层次结构和大多数现有的文件系统类似:用户可以创建、删除、移动或重命名文件。HDFS暴露了文件系统的名字空间,用户能够以文件的形式在上面存储数据。

HDFS文件系统常见命令如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Command                         Descriptionhadoop
fs -mkdir mydir             Create a directory (mydir) in HDFShadoop
fs -ls                      List files and directories in HDFShadoop
fs -cat myfile              View a file contenthadoop
fs -du                      Check disk space usage in HDFShadoop
fs -expunge                 Empty trash on HDFShadoop
fs -chgrp hadoop file1      Change group membership of a filehadoop
fs -chown huser file1       Change file ownershiphadoop
fs -rm file1                Delete a file in HDFShadoop
fs -touchz file2            Create an empty filehadoop
fs -stat file1              Check the status of a filehadoop
fs -test -e file1           Check if file exists on HDFShadoop
fs -test -z file1           Check if file is empty on HDFShadoop
fs -test -d file1           Check if file1 is a directory on HDFS

用户存储下载文件命令如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
Command                                                     Description
hadoop fs -copyFromLocal <source> <destination>     Copy from local fileystem to HDFS
hadoop fs -copyFromLocal file1 data                 Copies file1 from local FS to data dir in HDFS
hadoop fs -copyToLocal <source> <destination>       copy from hdfs to local filesystem
hadoop fs -copyToLocal data/file1 /var/tmp          Copies file1 from HDFS data directory to /var/tmp on local FS
hadoop fs -put <source> <destination>               Copy from remote location to HDFS
hadoop fs -get <source> <destination>               Copy from HDFS to remote directory

hadoop distcp hdfs://192.168.0.8:8020/input hdfs://192.168.0.8:8020/output  
Copy data from one cluster to another using the cluster URL

hadoop fs -mv file:///data/datafile /user/hduser/data   
Move data file from the local directory to HDFShadoop

NameNode

HDFS采用master/slave架构。一个HDFS集群是由一个Namenode和一定数目的Datanodes组成。Namenode是一个中心服务器:

(1) 负责管理文件系统的名字空间操作,比如打开、关闭、重命名文件或目录。

(2) 负责确定数据块到具体Datanode节点的映射(文件到块的映射,块到DataNode的映射)。

(3) 监督Data nodes的健康

(4) 协调数据的存取

Datanode负责处理文件系统客户端的读写请求,在Namenode的统一调度下进行数据块的创建、删除和复制。

NameNode的特性可以总结如下:

(1) NameNode运行时所有数据都保存到内存,整个HDFS可存储的文件数受限于NameNode的内存大小。

(2) 一个block在NameNode中对应一条记录(一般一条记录占用150字节),如果是大量的小文件,会消耗大量内存。

(3) NameNode的数据会持久化到本地磁盘,但不保存block的位置信息,block由DataNode注册时上报和运行时维护

(4) NameNode失效则整个HDFS都失效了,所以要保证NameNode的可用性。

  • 文件系统元数据的持久化

Namenode上保存着HDFS的名字空间。对于任何对文件系统元数据产生修改的操作,Namenode都会使用一种称为EditLog的事务日志记录下来。例如,在HDFS中创建一个文件,Namenode就会在Editlog中插入一条记录来表示;同样地,修改文件的副本系数也将往Editlog插入一条记录。Namenode在本地操作系统的文件系统中存储这个Editlog。整个文件系统的名字空间,包括数据块到文件的映射、文件的属性等,都存储在一个称为FsImage的文件中,这个文件也是放在Namenode所在的本地文件系统上。

Namenode在内存中保存着整个文件系统的名字空间和文件数据块映射(Blockmap)的映像。这个关键的元数据结构设计得很紧凑,因而一个有4G内存的Namenode足够支撑大量的文件和目录。当Namenode启动时,它从硬盘中读取Editlog和FsImage,将所有Editlog中的事务作用在内存中的FsImage上,并将这个新版本的FsImage从内存中保存到本地磁盘上,然后删除旧的Editlog,因为这个旧的Editlog的事务都已经作用在FsImage上了。这个过程称为一个检查点(checkpoint)。在当前实现中,检查点只发生在Namenode启动时,在不久的将来将实现支持周期性的检查点。

  • 元数据磁盘错误

FsImage和Editlog是HDFS的核心数据结构。如果这些文件损坏了,整个HDFS实例都将失效。因而,Namenode可以配置成支持维护多个FsImage和Editlog的副本。任何对FsImage或者Editlog的修改,都将同步到它们的副本上。这种多副本的同步操作可能会降低Namenode每秒处理的名字空间事务数量。然而这个代价是可以接受的,因为即使HDFS的应用是数据密集的,它们也非元数据密集的。当Namenode重启的时候,它会选取最近的完整的FsImage和Editlog来使用。

  • 单一NameNode的不足

Namenode是HDFS集群中的单点故障(single point of failure)所在。如果Namenode机器故障,是需要手工干预的。目前,自动重启或在另一台机器上做Namenode故障转移的功能还没实现。

集群中单一Namenode的结构大大简化了系统的架构,但同时也带来了如下问题:

(1) NameNode的职责过重,无法避免单点故障

(2) 最大的瓶颈来自于内存,目前的NameNode,其元数据都是存储于单台服务器的内存里,那么其存储容量就受到了单台服务器内存容量的限制, 据估算,在装配100GB内存的服务器上,一般只能存储几亿级别的文件数。

(3) 随着集群规模的扩大,对于元数据的读写请求也会随之增多,那么元数据的访问性能也会受到单台服务器处理能力的限制。

那么如何对NameNode进行扩展呢? 一种可行的方法是对NameNode的职责进行分离。

NameNode的职责可以分成两部分:1) 名字空间的管理; 2) 块管理:文件到块的映射,块到DataNode的映射。因此对其进行职责分离的解决方案可以为:NameNode只负责命名空间管理,在HDFS系统中新增加一个角色,这个角色专门负责块管理(文件到块的映射,块到DataNode的映射)

Secondary NameNode

NameNode将对文件系统的改动追加保存到本地文件系统上的一个日志文件(edits)。当一个NameNode启动时,它首先从一个映像文件(fsimage)中读取HDFS的状态,接着应用日志文件中的edits操作。然后它将新的HDFS状态写入(fsimage)中,并使用一个空的edits文件开始正常操作。因为NameNode只有在启动阶段才合并fsimage和edits,所以久而久之日志文件可能会变得非常庞大,特别是对大型的集群。日志文件太大的另一个副作用是下一次NameNode启动会花很长时间。

Secondary NameNode定期(缺省为每小时)合并fsimage和edits日志,将edits日志文件大小控制在一个限度下。Secondary Namenode会连接到Namenode,同步Namenode的fsimage文件和edits文件。Secondary Namenode 合并fsimage文件和edits文件到一个新的文件中,保存到本地的同时把这些文件发送回NameNode。 当Namenode宕机,保存在Secondary Namenode中的文件可以用来恢复Namenode。在一个繁忙的集群中,系统管理员可以配置同步时间为更小的时间间隔,比如每分钟。

因为内存需求和NameNode在一个数量级上,所以通常secondary NameNode和NameNode运行在不同的机器上。Secondary NameNode通过bin/start-dfs.sh在conf/masters中指定的节点上启动。

总结一下Secondary Namenode: > 1)不是NameNode的备份 2)周期性合并fsimage和editslog,并推送给NameNode 3)辅助恢复NameNode

DataNode

DataNode的特性可以总结为:

(1) 保存具体的block数据

(2) 负责数据的读写操作和复制操作

(3) DataNode启动时会向NameNode报告当前存储的数据块信息,后续也会定时报告修改信息

(4) DataNode之间会进行通信,复制数据块,保证数据的冗余性

  • 数据组织

HDFS被设计成支持大文件,适用HDFS的是那些需要处理大规模的数据集的应用。这些应用都是只写入数据一次,但却读取一次或多次,并且读取速度应能满足流式读取的需要。HDFS支持文件的“一次写入多次读取”语义。一个大文件会被拆分成一个个的块(block),然后存储于不同的DataNode上。如果一个文件小于一个block的大小,那么实际占用的空间为其文件的大小。

DataNode将HDFS数据以文件的形式存储在本地的文件系统中,它并不知道有关HDFS文件的信息。它把每个HDFS数据块(block)存储在本地文件系统的一个单独的文件中,每个块都会被复制到多台机器,默认复制3份。在DataNode中block是基本的存储单位(每次都是读写一个块),默认大小为64M。配置大的块主要是因为:

(1) 减少搜寻时间,一般硬盘传输速率比寻道时间要快,大的块可以减少寻道时间;

(2) 减少管理块的数据开销,每个块都需要在NameNode上有对应的记录;

(3) 对数据块进行读写,减少建立网络的连接成本

  • 数据复制

HDFS被设计成能够在一个大集群中跨机器可靠地存储超大文件。它将每个文件存储成一系列的数据块,除了最后一个,所有的数据块都是同样大小的。为了容错,文件的所有数据块都会有副本。每个文件的数据块大小和副本系数都是可配置的。应用程序可以指定某个文件的副本数目。副本系数可以在文件创建的时候指定,也可以在之后改变。HDFS中的文件都是一次性写入的,并且严格要求在任何时候只能有一个写入者

Namenode全权管理数据块的复制,它周期性地从集群中的每个DataNode接收 心跳信号块状态报告(Blockreport)接收到心跳信号意味着该DataNode节点工作正常。块状态报告包含了一个该Datanode上所有数据块的列表

副本的存放是HDFS可靠性和性能的关键。优化的副本存放策略是HDFS区分于其他大部分分布式文件系统的重要特性。这种特性需要做大量的调优,并需要经验的积累。HDFS采用一种称为 机架感知(rack-aware) 的策略来改进数据的可靠性、可用性和网络带宽的利用率。大型HDFS实例一般运行在跨越多个机架的计算机组成的集群上,不同机架上的两台机器之间的通讯需要经过交换机。在大多数情况下,同一个机架内的两台机器间的带宽会比不同机架的两台机器间的带宽大。

通过一个机架感知的过程,Namenode可以确定每个Datanode所属的机架id。一个简单但没有优化的策略就是将副本存放在不同的机架上。这样可以有效防止当整个机架失效时数据的丢失,并且允许读数据的时候充分利用多个机架的带宽。这种策略设置可以将副本均匀分布在集群中,有利于当组件失效情况下的负载均衡。但是,因为这种策略的一个写操作需要传输数据块到多个机架,这增加了写的代价。

在大多数情况下,副本系数是3,HDFS的存放策略是将一个副本存放在本地机架的节点上,一个副本放在同一机架的另一个节点上,最后一个副本放在不同机架的节点上。这种策略减少了机架间的数据传输,这就提高了写操作的效率。机架的错误远远比节点的错误少,所以这个策略不会影响到数据的可靠性和可用性。于此同时,因为数据块只放在两个(不是三个)不同的机架上,所以此策略减少了读取数据时需要的网络传输总带宽。在这种策略下,副本并不是均匀分布在不同的机架上。三分之一的副本在一个节点上,三分之二的副本在一个机架上,其他副本均匀分布在剩下的机架中,这一策略在不损害数据可靠性和读取性能的情况下改进了写的性能。

为了降低整体的带宽消耗和读取延时,HDFS会尽量让读取程序读取离它最近的副本。如果在读取程序的同一个机架上有一个副本,那么就读取该副本。如果一个HDFS集群跨越多个数据中心,那么客户端也将首先读本地数据中心的副本。

HDFS中的block数据3备份的复制采用的是的 流水线复制 方式,从前一个节点接收数据,并在同时转发给下一个节点,数据以流水线的方式从前一个DataNode复制到下一个。:

HDFS文件的写入具体过程如下图所示:

(1)客户端将文件写入本地磁盘的临时文件中(Staging)

(2)当临时文件大小达到一个block大小时,HDFS client通知NameNode申请写入文件

(3)NameNode在HDFS的文件系统中创建一个文件,并把该block id和要写入的DataNode的列表返回给客户端

(4)客户端收到这些信息后,将临时文件写入DataNodes

1
2
3
4
5
6
7
4.1 客户端将文件内容写入第一个DataNode(一般以4kb为单位进行传输)
4.2 第一个DataNode接收后,将数据写入本地磁盘,同时也传输给第二个DataNode
4.3 依此类推到最后一个DataNode,数据在DataNode之间是通过pipeline的方式进行复制的
4.4 后面的DataNode接收完数据后,都会发送一个确认给前一个DataNode,最终第一个DataNode返回确认给客户端
4.5 当客户端接收到整个block的确认后,会向NameNode发送一个最终的确认信息
4.6 如果写入某个DataNode失败,数据会继续写入其他的DataNode。然后NameNode会找另外一个好的DataNode继续复制,以保证冗余性
4.7 每个block都会有一个校验码,并存放到独立的文件中,以便读的时候来验证其完整性

(5)文件写完后(客户端关闭),NameNode提交文件(这时文件才可见,如果提交前,NameNode垮掉,那文件也就丢失了。fsync:只保证数据的信息写到NameNode上,但并不保证数据已经被写到DataNode中)

当一个文件的副本系数被减小后,NameNode会选择过剩的副本删除。下次心跳检测时会将该信息传递给DataNode。Datanode遂即移除相应的数据块,集群中的空闲空间加大。

HDFS读文件流程:

  • 1.Hdfs 读数据的时候,客户端首先通过调用hdfs提供的open()读取目标文件,hdfs根据需要向NameNode节点查询这些文件存储在哪些DataNode上
  • 2.NameNode返回所有文件block副本的DataNode节点的位置信息(并且根据他们与客户端的距离进行排序)
  • 3.客户端从这些DataNode上读取block数据,一个block读完,继续寻找下一个block位置节点,直到读取完。
  • 4.在读完一个block后,都会进行checksum校验,如果有错误,会通知NameNode,并且从下一个包含block的Datanode继续读。
  • 数据的可靠性

HDFS 的一个重要目标是可靠存储数据,即使在 NameNode、 DataNode或网络分区中出现故障。HDFS 克服故障的第一个步骤是探测。HDFS 使用心跳消息来探测 NameNode 和 DataNode 之间的连通性。HDFS 心跳有几种情况可能会导致 NameNode 和 DataNode 之间的连通性丧失。因此,每个 DataNode 都向它的 NameNode 发送定期心跳消息,这样,如果 NameNode 不能接收心跳消息,就表明连通性丧失。

NameNode 将不能响应心跳消息的 DataNode 标记为 “死 DataNode ”,并不再向它们发送请求。 存储在一个死节点上的数据不再对那个节点的 HDFS 客户端可用,该节点将被从系统有效地移除。如果一个节点的死亡导致数据块的复制因子降至最小值之下, Name node 将启动附加复制,将复制因子带回正常状态。

DataNode每3秒钟发送一个heartbeats给NameNode ,二者使用TCP9000端口的TCP握手来实现heartbeats。每十个heartbeats会有一个block report,Data node告知它所保存的数据块。block report使得Namenode能够重建它的metadata以确保每个数据block有足够的copy,并且分布在不同的机架上。

  • Rebalancer

HDFS的数据也许并不是非常均匀的分布在各个DataNode中。一个常见的原因是在现有的集群上经常会增添新的DataNode节点。

当新增一个数据块(一个文件的数据被保存在一系列的块中)时,NameNode在选择DataNode接收这个数据块之前,会考虑到很多因素。其中的一些考虑的是:

(1) 将数据块的一个副本放在正在写这个数据块的节点上。 (2) 尽量将数据块的不同副本分布在不同的机架上,这样集群可在完全失去某一机架的情况下还能存活。 (3) 一个副本通常被放置在和写文件的节点同一机架的某个节点上,这样可以减少跨越机架的网络I/O。 (4) 尽量均匀地将HDFS数据分布在集群的DataNode中。

由于上述多种考虑需要取舍,数据可能并不会均匀分布在DataNode中。HDFS为提供了一个工具(balancer 命令),用于分析数据块分布和重新平衡DataNode上的数据分布。

HDFS重要特性

  • 通讯协议

所有的HDFS通讯协议都是建立在TCP/IP协议之上。客户端通过一个可配置的TCP端口连接到NameNode,通过ClientProtocol协议与NameNode交互。而DataNode使用DataNodeProtocol协议与Namenode交互。一个远程过程调用(RPC)模型被抽象出来封装ClientProtocol和Datanodeprotocol协议。在设计上,Namenode不会主动发起RPC,而是响应来自客户端或 Datanode 的RPC请求。

  • 安全模式

NameNode启动后会进入一个称为安全模式的特殊状态。处于安全模式的Namenode是不会进行数据块的复制的。NameNode从所有的 DataNode接收心跳信号和块状态报告。块状态报告包括了某个DataNode所有的数据块列表。每个数据块都有一个指定的最小副本数。当Namenode检测确认某个数据块的副本数目达到这个最小值,那么该数据块就会被认为是副本安全(safely replicated)的;在一定百分比(这个参数可配置)的数据块被NameNode检测确认是安全之后(加上一个额外的30秒等待时间),NameNode将退出安全模式状态。接下来它会确定还有哪些数据块的副本没有达到指定数目,并将这些数据块复制到其他Datanode上。

  • 数据完整性

从某个Datanode获取的数据块有可能是损坏的,损坏可能是由DataNode的存储设备错误、网络错误或者软件bug造成的。HDFS客户端软件实现了对HDFS文件内容的校验和(checksum)检查。当客户端创建一个新的HDFS文件,会计算这个文件每个数据块的校验和,并将校验和作为一个单独的隐藏文件保存在同一个HDFS名字空间下。当客户端获取文件内容后,它会检验从DataNode获取的数据跟相应的校验和文件中的校验和是否匹配,如果不匹配,客户端可以选择从其他DataNode获取该数据块的副本。

  • 存储空间回收

当用户或应用程序删除某个文件时,这个文件并没有立刻从HDFS中删除。实际上,HDFS会将这个文件重命名转移到/trash目录。只要文件还在/trash目录中,该文件就可以被迅速地恢复。文件在/trash中保存的时间是可配置的,当超过这个时间时,Namenode就会将该文件从名字空间中删除。删除文件会使得该文件相关的数据块被释放。注意,从用户删除文件到HDFS空闲空间的增加之间会有一定时间的延迟。

Ref

[Ubuntu] How to change the MySQL data default directory

MySQL is a widely used and fast SQL database server. It is a client/server implementation that consists of a server daemon (mysqld) and many different client programs/libraries.

If you want to install Mysql database server in Ubuntu check this tutorial.
What is Mysql Data Directory?

Mysql data directory is important part where all the mysql databases storage location.By default MySQL data default directory located in /var/lib/mysql.If you are running out of space in /var partition you need to move this to some other location.

Note:- This is only for advanced users and before moving default directory make a backup of your mysal databases.

Procedure to follow

Open the terminal

First you need to Stop MySQL using the following command

sudo /etc/init.d/mysql stop

Now Copy the existing data directory (default located in /var/lib/mysql) using the following command

sudo cp -R -p /var/lib/mysql /path/to/new/datadir

All you need are the data files, so delete the others with the command

sudo rm /path/to/new/datadir

Note:- You will get a message about not being able to delete some directories, but that’s what you want.

Now edit the MySQL configuration file with the following command

sudo vim /etc/mysql/my.cnf

Look for the entry for “datadir”, and change the path (which should be “/var/lib/mysql”) to the new data directory.

Important Note:-From Ubuntu 7.10 (Gutsy Gibbon) forward, Ubuntu uses some security software called AppArmor that specifies the areas of your filesystem applications are allowed to access. Unless you modify the AppArmor profile for MySQL, you’ll never be able to restart MySQL with the new datadir location.

In the terminal, enter the command

sudo vim /etc/apparmor.d/usr.sbin.mysqld

Copy the lines beginning with “/var/lib/mysql”, comment out the originals with hash marks (“#”), and paste the lines below the originals.

Now change “/var/lib/mysql” in the two new lines with “/path/to/new/datadir”. Save and close the file.

Restart the AppArmor profiles with the command

sudo /etc/init.d/apparmor reload

Restart MySQL with the command

sudo /etc/init.d/mysql restart

Now MySQL should start with no errors, and your data will be stored in the new data directory location.

ubuntu 修改mysql 数据文件夹

https://stackoverflow.com/questions/17968287/how-to-find-the-mysql-data-directory-from-command-line-in-windows

https://www.digitalocean.com/community/tutorials/how-to-move-a-mysql-data-directory-to-a-new-location-on-ubuntu-16-04

https://askubuntu.com/questions/790685/cannot-set-a-different-database-directory-for-mysql-errcode-13-permission-d

rsync 帮助脚本

#!/bin/bash 
  
#this script for start|stop rsync daemon service 
#date:2012/2/13 
  
status1=$(ps -ef | egrep "rsync --daemon.*rsyncd.conf" | grep -v 'grep') 
pidfile="/var/run/rsyncd.pid" 
start_rsync="rsync --daemon --config=/etc/rsyncd.conf" 
  
function rsyncstart() { 
  
    if [ "${status1}X" == "X" ];then 
  
        rm -f $pidfile       
  
        ${start_rsync}   
  
        status2=$(ps -ef | egrep "rsync --daemon.*rsyncd.conf" | grep -v 'grep') 
          
        if [  "${status2}X" != "X"  ];then 
              
            echo "rsync service start.......OK" 
              
        fi 
  
    else 
  
        echo "rsync service is running !"    
  
    fi 
} 
  
function rsyncstop() { 
  
    if [ "${status1}X" != "X" ];then 
      
        kill -9 $(cat $pidfile) 
  
        status2=$(ps -ef | egrep "rsync --daemon.*rsyncd.conf" | grep -v 'grep') 
  
        if [ "${statusw2}X" == "X" ];then 
              
            echo "rsync service stop.......OK" 
        fi 
    else 
  
        echo "rsync service is not running !"    
  
    fi 
} 
  
  
function rsyncstatus() { 
  
  
    if [ "${status1}X" != "X" ];then 
  
        echo "rsync service is running !"   
      
    else 
  
         echo "rsync service is not running !"  
  
    fi 
  
} 
  
function rsyncrestart() { 
  
    if [ "${status1}X" == "X" ];then 
  
               echo "rsync service is not running..." 
  
               rsyncstart 
        else 
  
               rsyncstop 
  
               rsyncstart    
  
        fi       
}  
  
case $1 in 
  
        "start") 
               rsyncstart 
                ;; 
  
        "stop") 
               rsyncstop 
                ;; 
  
        "status") 
               rsyncstatus 
               ;; 
  
        "restart") 
               rsyncrestart 
               ;; 
  
        *) 
          echo 
                echo  "Usage: $0 start|stop|restart|status" 
          echo 
esac

linux免密钥SSH登陆配置

背景:
好多linux系统需要维护,那么就需要配置SSH免密钥登陆,此处涉及双向和单向两种。详情参见

环境:
master:
192.168.38.45
slave:
192.168.38.58
192.168.38.60
首先配置单向的也就是master到slave的免密钥ssh登陆。

单向配置:
1.在master和所有slave上,使用yourname用户名执行:

ssh-keygen -t dsa -P '' -f /home/yourname/.ssh/id_dsa  

2.在master的/home/yourname/.ssh目录下,执行 :

cat id_dsa.pub > authorized_keys  

3.将master上的authorized_keys拷贝到所有slave的相同目录下。命令:

scp   /home/yourname/.ssh/authorized_keys yourname@192.168.38.58:/home/yourname/.ssh/  
scp   /home/yourname/.ssh/authorized_keys yourname@192.168.38.60:/home/yourname/.ssh/  

此时可以master向slaves单向免密钥登陆

如果打算循环双向登陆那么参见如下步骤

双向:
1.在master和所有slave上,使用yourname用户名执行:

ssh-keygen -t dsa -P '' -f /home/yourname/.ssh/id_dsa  

2.在master的/home/yourname/.ssh目录下,执行 :

cat id_dsa.pub > authorized_keys   

3.将master上的authorized_keys拷贝到某slave的相同目录下。命令:

scp   /home/yourname/.ssh/authorized_keys yourname@192.168.38.58:/home/yourname/.ssh/   

4.把58信息加入到authorized_keys:

 cat id_dsa.pub >> authorized_keys   

5.把58上的 authorized_keys拷贝到60上并加入authorized_keys:

scp   /home/yourname/.ssh/authorized_keys yourname@192.168.38.60:/home/yourname/.ssh/  
cat id_dsa.pub >> authorized_keys  
 

6.此时authorized_keys拥有所有机器的id_dsa.pub,那么把他scp到其他节点上即可:

scp   /home/yourname/.ssh/authorized_keys yourname@192.168.38.58:/home/yourname/.ssh/  
scp   /home/yourname/.ssh/authorized_keys yourname@192.168.38.45:/home/yourname/.ssh/ 

此时可以双向了
ps:所有节点的id_dsa.pub 都必须加入到authorized_keys,那么如果集群中增加一个节点如何操作呢?增加进去然后全部scp分发出去

务必确认如下问题:
1./etc/ssh/sshd_config中对应的AuthorizedKeysFile .ssh/authorized_keys是配置了的
2.如果没有配置那么需要配置然后重启:/etc/rc.d/init.d/sshd restart
3.权限问题
chmod 700 .ssh
chmod 600 authorized_keys

redis持久化——RDB、AOF

原文

redis支持两种持久化方案:RDB和AOF。

RDB

RDB持久化是redis默认的,用来生成某一个时间点的数据快照;RDB是一个经过压缩的二进制文件,采用RDB持久化时服务器只会保存一个RDB文件(维护比较简单);

  • 每次进行RDB持久化时,redis都是将内存中完成的数据写到文件中,不是增量的持久化(写脏数据)
  • 写RDB文件时,先把内存中数据写到临时文件,然后替换原来的RDB文件;

1、RDB文件的生成:
RDB持久化可以手动执行,也可以按照服务器的配置定期执行。
1)save和bgsave命令:(手动用于生成RDB文件的命令)

  • save:会阻塞redis服务器进程,直到创建RDB文件完毕为止;(在此期间进程不能处理任何请求)
  • bgsave:fork一个子进程来创建RDB文件,父进程可以继续处理命令请求;

2)自动执行:
redis服务器允许用户通过设置配置文件save选项,让服务器每隔一段时间自动执行一次bgsave命令。如下:配置文件中的save选项
save 900 1
save 300 10
save 60 10000

服务器内部维护了一个dirty计数器和lastsave属性:

  • dirty:记录了距上次成功执行了save或bgsave命令之后,数据库修改的次数(写入、删除、更新等);
  • lastsave:unix时间戳,记录了上一次成功执行save或bgsave命令的时间;

redis服务器的周期性操作函数serverCron默认每100毫秒执行一次,该函数的一个主要作用就是检查save选项所设置的保存条件是否满足,如果满足就执行bgsave命令。检查的过程:根据系统当前时间、dirty和lastsave属性的值来检验保存条件是否满足。

补充:

  • 在执行bgsave期间,客户端发送的save、bgsave命令会被服务器拒绝执行(防止产生竞争);
  • 如果bgsave在执行,bgrewriteaof命令会被延迟到bgsave执行完毕后再执行;
  • 如果bgrewriteaof在执行,bgsave命令也会被延迟到bgrewriteaof命令执行完毕后再执行(bgsave和bgrewriteaof都是通过子进程完成的,不存在冲突,主要是考虑性能);

2、RDB文件的载入:

  • redis并没有专门的命令去载入RDB文件,只有在服务器启动的时候检测到RDB文件存在就会自动执行载入。
  • 如果redis启用了AOF持久化功能,那么服务器优先使用AOF文件还原数据。
  • 当服务器载入RDB文件时,会一直处于阻塞状态,直到载入完毕为止。
  • 载入时RDB文件时,系统会自动检查、如果是过期键不会加载到数据库中。

3、其他:
1)redis会根据配置文件中rdbcompression属性决定保存RDB文件时是否进行压缩存储(默认为yes),如果打开了压缩功能,redis会判断字符串长度>=20字节则压缩,否则不压缩存储;
2)redis自带了RDB文件检查工具redis-check-dump;
3)使用od命令打印RDB文件:[root@centOS1 dir]# od -oc dump.rdb
4)RDB文件名通过配置文件dbfilename dump.rdb指定,存放位置通过配置文件dir /var/lib/redis/ 指定;

AOF

有上面分析可知:RDB方式持久化的颗粒比较大,当服务器宕机时,到上次save或bgsave后的所有数据都会丢失。而AOF的持久化颗粒比较细,当服务器宕机后,只有宕机之前没来得AOF的操作数据会丢失。

1、AOF实现:

1)AOF持久化是通过保存redis服务器所执行的写命令来记录数据库状态的;被写入AOF文件的所有命令都是以Redis的命令请求协议格式保存的(Redis的请求协议是纯文本的)。服务器在启动时,通过载入AOF文件、并执行其中的命令来还原服务器状态。

2)AOF文件名通过配置文件appendfilename appendonly.aof 指定,存放位置通过配置文件dir /var/lib/redis/ 指定;

3)AOF步骤:

  • 命令追加:服务器在执行玩一个写命令后,会以协议的格式把其追加到aof_buf缓冲区末尾;
  • 文件写入:redis服务器进程就是一个事件循环,在每次事件循环结束,会根据配置文件中的appednfsync属性值决定是否将aof_buf中的数据写入到AOF文件中;
  • 文件同步:将内存缓冲区的数据写到磁盘;(由于OS的特性导致)

补充:

为了提高文件写入效率,现代OS中通常会把写入数据暂时保存在一个内存的缓冲区里面,等到缓冲区满或超时,一次性把其中的内容再写到磁盘。

4)appendfsync选项:

  • always:将aof_buf中所有内容写入、同步到AOF文件;
  • everysec:将aof_buf中所有内容写到AOF文件,如果上次同步AOF文件时间距当前时间超过1s,那么对AOF文件同步;(由专门线程负责)
  • no:将aof_buf中所有内容写入AOF文件,合适同步根据操作系统决定;

5)AOF持久化效率和安全性:(appendfsync选项控制)
always每次都要写入、同步,所以其安全性最高,效率是最慢的;everysec效率也足够快,也安全性也可以得到保证;no效率最高,但安全性比较差。

2、AOF文件载入:
AOF文件中记录了数据库中所有写操作的命令,所以服务器只需要重新执行一遍AOF文件中的命令即可恢复服务器关闭之前的状态。步骤如下:

  • 创建一个不带网络连接的伪客户端;
  • 从AOF文件中分析并读取一条写命令;
  • 使用伪客户端执行被读出的写命令;

3、AOF重写:
由于AOF记录了数据库的所有写命令,所以随着服务器的运行,AOF文件中内容会越来越大。实际上,对于一个键值,由于多次的修改,会产生很多写命令;中间的一些写操作可以直接省去,直接把最终的键值信息记录到AOF文件中即可,从而减小AOF文件大小。

1)AOF重写:
为了解决AOF文件体积膨胀问题,redis服务器使用了AOF重写功能:创建一个新的AOF文件来代替现有的AOF文件,新旧两个AOF文件所保存的数据库状态相同,但新AOF文件不会包含任何浪费空间的冗余命令,所以新AOF文件体积会比旧AOF文件体积小很多。

2)原理:
AOF文件重写并不需要对现有AOF文件进行任何读取、分析或者写入操作,这个功能是通过读取服务器当前的数据库状态来实现的。首先,从数据库中读取键现在的值,然后使用一条命令去记录键值对,代替之前记录这个键值对的多条命令,这就是AOF重写的原理。

注:

为了避免在执行命令时造成客户端缓冲区溢出,重写程序在处理集合、链表等发现带有多个元素时,会检查元素数量是否超过redis默认的64,如果超过了,redis会使用多个命令代替这一条命令。

3)bgrewriteaof命令、AOF重写缓冲区:
由于redis是单进程的,为了不在进行重写时阻塞服务,redis使用了子进程的方式进行AOF重写。——bgrewriteaof

在使用子进程进程AOF重写时会产生另一个问题:子进程在AOF重写时,服务器主进程还在继续处理命令请求,而新的命令可能会对现有数据库状态进行修改,而从使得服务器当前状态和重写后的AOF文件所保存的服务器状态不一致。为了解决这个问题,redis引入了AOF重写缓冲区。

AOF重写缓冲区是在服务器创建子进程之后开始使用:

  • 执行客户端命令;
  • 将执行后的写命令追加到aof_buf(AOF缓冲区);
  • 将执行后的写命令追加到AOF重写缓冲区;

aof_buf(AOF缓冲区)会定期被写入、同步到AOF文件;而在AOF重写期间新的命令会写到AOF重写缓冲区。当AOF重写完成后,会向父进程发送一个信号,父进程收到信号后会阻塞当前服务,进行如下操作:

  • 将AOF重写缓冲区中的写命令写入到新的AOF文件;(保证了新AOF文件数据库状态和当前数据库状态的一致)
  • 对新的AOF文件重命名、原子的覆盖旧的AOF;

注:

整个AOF重写中,只有信号处理是阻塞的;当信号处理完毕后父进程就可以接收命令请求了。

4、如果redis开始基于rdb进行的持久化,之后通过appendonly yes 打开了aof,这时重新启动redis后会根据aof进行载入,所以原来所有的数据无法加载到数据库中。

HaProxy日志详解

Log levels 日志级别
global 全局参数,如果实例上没设置参数,仅有log global那么每个实例都会使用该参数。

log global
log <address> [len <length>] <facility> [<level> [<minlevel>]]

address 日志发送的目的IP地址。

            - IPV4 默认UDP端口514  例如:127.0.0.1:514
            - IPV6 默认UDP端口514
            - 文件系统路径到scoket,保证文件是可写的。

length 日志输出行最大字符长度,值范围80-65535 默认值 1024
facility 必须是24个标准的syslog设施

             kern   user   mail   daemon auth   syslog lpr    news
             uucp   cron   auth2  ftp    ntp    audit  alert  cron2
             local0 local1 local2 local3 local4 local5 local6 local7

level 日志级别,可以设置一个最低日志级别,可以发送最低日志级别以上级别的日志信息

            emerg  alert  crit   err    warning notice info  debug
Example :

log global

log 127.0.0.1:514 local0 notice         

log 127.0.0.1:514 local0 notice notice

log ${LOCAL_SYSLOG}:514 local0 notice 

Log Formats
Haproxy 支持5种日志格式
1、默认格式:这是非常基本的,很少使用。它只提供关于传入连接的非常基本的信息 源IP:端口、目的IP:端口,和前端的名字。

Example :
    listen www
        mode http
        log global
        server srv1 127.0.0.1:8000

>&gt;&gt; Feb  6 12:12:09 localhost \
      haproxy[14385]: Connect from 10.0.1.2:33312 to 10.0.3.31:8012 \
      (www/HTTP)

    Field   Format                                Extract from the example above
      1   process_name '[' pid ']:'                            haproxy[14385]:
      2   'Connect from'                                          Connect from
      3   source_ip ':' source_port                             10.0.1.2:33312
      4   'to'                                                              to
      5   destination_ip ':' destination_port                   10.0.3.31:8012
      6   '(' frontend_name '/' mode ')'                            (www/HTTP)

2、TCP协议格式:通过“option tcplog” 启用该格式,这种格式提供了更丰富的信息,如定时器,连接数,队列大小等。这格式推荐纯TCP代理。

TCP协议格式字段定义串:
log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq


Example :
    frontend fnt
        mode tcp
        option tcplog
        log global
        default_backend bck

    backend bck
        server srv1 127.0.0.1:8000

>&gt;&gt; Feb  6 12:12:56 localhost \
      haproxy[14387]: 10.0.1.2:33313 [06/Feb/2009:12:12:51.443] fnt \
      bck/srv1 0/0/5007 212 -- 0/0/0/0/3 0/0
  Field   Format                                Extract from the example above
      1   process_name '[' pid ']:'                            haproxy[14387]:
      2   client_ip ':' client_port                             10.0.1.2:33313
      3   '[' accept_date ']'                       [06/Feb/2009:12:12:51.443]
      4   frontend_name                                                    fnt
      5   backend_name '/' server_name                                bck/srv1
      6   Tw '/' Tc '/' Tt*                                           0/0/5007
      7   bytes_read*                                                      212
      8   termination_state                                                 --
      9   actconn '/' feconn '/' beconn '/' srv_conn '/' retries*    0/0/0/0/3
     10   srv_queue '/' backend_queue                                      0/0

字段含义

Detailed fields description :
  - "client_ip" is the IP address of the client which initiated the TCP
    connection to haproxy. If the connection was accepted on a UNIX socket
    instead, the IP address would be replaced with the word "unix". Note that
    when the connection is accepted on a socket configured with "accept-proxy"
    and the PROXY protocol is correctly used, then the logs will reflect the
    forwarded connection's information.

  - "client_port" is the TCP port of the client which initiated the connection.
    If the connection was accepted on a UNIX socket instead, the port would be
    replaced with the ID of the accepting socket, which is also reported in the
    stats interface.

  - "accept_date" is the exact date when the connection was received by haproxy
    (which might be very slightly different from the date observed on the
    network if there was some queuing in the system's backlog). This is usually
    the same date which may appear in any upstream firewall's log.

  - "frontend_name" is the name of the frontend (or listener) which received
    and processed the connection.

  - "backend_name" is the name of the backend (or listener) which was selected
    to manage the connection to the server. This will be the same as the
    frontend if no switching rule has been applied, which is common for TCP
    applications.

  - "server_name" is the name of the last server to which the connection was
    sent, which might differ from the first one if there were connection errors
    and a redispatch occurred. Note that this server belongs to the backend
    which processed the request. If the connection was aborted before reaching
    a server, "<NOSRV>" is indicated instead of a server name.

  - "Tw" is the total time in milliseconds spent waiting in the various queues.
    It can be "-1" if the connection was aborted before reaching the queue.
    See "Timers" below for more details.

  - "Tc" is the total time in milliseconds spent waiting for the connection to
    establish to the final server, including retries. It can be "-1" if the
    connection was aborted before a connection could be established. See
    "Timers" below for more details.

  - "Tt" is the total time in milliseconds elapsed between the accept and the
    last close. It covers all possible processing. There is one exception, if
    "option logasap" was specified, then the time counting stops at the moment
    the log is emitted. In this case, a '+' sign is prepended before the value,
    indicating that the final one will be larger. See "Timers" below for more
    details.

  - "bytes_read" is the total number of bytes transmitted from the server to
    the client when the log is emitted. If "option logasap" is specified, the
    this value will be prefixed with a '+' sign indicating that the final one
    may be larger. Please note that this value is a 64-bit counter, so log
    analysis tools must be able to handle it without overflowing.

  - "termination_state" is the condition the session was in when the session
    ended. This indicates the session state, which side caused the end of
    session to happen, and for what reason (timeout, error, ...). The normal
    flags should be "--", indicating the session was closed by either end with
    no data remaining in buffers. See below "Session state at disconnection"
    for more details.

  - "actconn" is the total number of concurrent connections on the process when
    the session was logged. It is useful to detect when some per-process system
    limits have been reached. For instance, if actconn is close to 512 when
    multiple connection errors occur, chances are high that the system limits
    the process to use a maximum of 1024 file descriptors and that all of them
    are used. See section 3 "Global parameters" to find how to tune the system.

  - "feconn" is the total number of concurrent connections on the frontend when
    the session was logged. It is useful to estimate the amount of resource
    required to sustain high loads, and to detect when the frontend's "maxconn"
    has been reached. Most often when this value increases by huge jumps, it is
    because there is congestion on the backend servers, but sometimes it can be
    caused by a denial of service attack.

  - "beconn" is the total number of concurrent connections handled by the
    backend when the session was logged. It includes the total number of
    concurrent connections active on servers as well as the number of
    connections pending in queues. It is useful to estimate the amount of
    additional servers needed to support high loads for a given application.
    Most often when this value increases by huge jumps, it is because there is
    congestion on the backend servers, but sometimes it can be caused by a
    denial of service attack.

  - "srv_conn" is the total number of concurrent connections still active on
    the server when the session was logged. It can never exceed the server's
    configured "maxconn" parameter. If this value is very often close or equal
    to the server's "maxconn", it means that traffic regulation is involved a
    lot, meaning that either the server's maxconn value is too low, or that
    there aren't enough servers to process the load with an optimal response
    time. When only one of the server's "srv_conn" is high, it usually means
    that this server has some trouble causing the connections to take longer to
    be processed than on other servers.

  - "retries" is the number of connection retries experienced by this session
    when trying to connect to the server. It must normally be zero, unless a
    server is being stopped at the same moment the connection was attempted.
    Frequent retries generally indicate either a network problem between
    haproxy and the server, or a misconfigured system backlog on the server
    preventing new connections from being queued. This field may optionally be
    prefixed with a '+' sign, indicating that the session has experienced a
    redispatch after the maximal retry count has been reached on the initial
    server. In this case, the server name appearing in the log is the one the
    connection was redispatched to, and not the first one, though both may
    sometimes be the same in case of hashing for instance. So as a general rule
    of thumb, when a '+' is present in front of the retry count, this count
    should not be attributed to the logged server.

  - "srv_queue" is the total number of requests which were processed before
    this one in the server queue. It is zero when the request has not gone
    through the server queue. It makes it possible to estimate the approximate
    server's response time by dividing the time spent in queue by the number of
    requests in the queue. It is worth noting that if a session experiences a
    redispatch and passes through two server queues, their positions will be
    cumulated. A request should not pass through both the server queue and the
    backend queue unless a redispatch occurs.

  - "backend_queue" is the total number of requests which were processed before
    this one in the backend's global queue. It is zero when the request has not
    gone through the global queue. It makes it possible to estimate the average
    queue length, which easily translates into a number of missing servers when
    divided by a server's "maxconn" parameter. It is worth noting that if a
    session experiences a redispatch, it may pass twice in the backend's queue,
    and then both positions will be cumulated. A request should not pass
    through both the server queue and the backend queue unless a redispatch
    occurs.

3、HTTP协议格式:通过“option httplog” 启用该格式,http代理推荐使用该格式。

HTTP协议格式字段定义串:
    log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ \
               %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r

Example :
    frontend http-in
        mode http
        option httplog
        log global
        default_backend bck

    backend static
        server srv1 127.0.0.1:8000

>&gt;&gt; Feb  6 12:14:14 localhost \
      haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in \
      static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} \
      {} &quot;GET /index.html HTTP/1.1&quot;
  Field   Format                                Extract from the example above
      1   process_name '[' pid ']:'                            haproxy[14389]:
      2   client_ip ':' client_port                             10.0.1.2:33317
      3   '[' accept_date ']'                       [06/Feb/2009:12:14:14.655]
      4   frontend_name                                                http-in
      5   backend_name '/' server_name                             static/srv1
      6   Tq '/' Tw '/' Tc '/' Tr '/' Tt*                       10/0/30/69/109
      7   status_code                                                      200
      8   bytes_read*                                                     2750
      9   captured_request_cookie                                            -
     10   captured_response_cookie                                           -
     11   termination_state                                               ----
     12   actconn '/' feconn '/' beconn '/' srv_conn '/' retries*    1/1/1/1/0
     13   srv_queue '/' backend_queue                                      0/0
     14   '{' captured_request_headers* '}'                   {haproxy.1wt.eu}
     15   '{' captured_response_headers* '}'                                {}
     16   '&quot;' http_request '&quot;'                      &quot;GET /index.html HTTP/1.1&quot;

字段含义

Detailed fields description :
  - "client_ip" is the IP address of the client which initiated the TCP
    connection to haproxy. If the connection was accepted on a UNIX socket
    instead, the IP address would be replaced with the word "unix". Note that
    when the connection is accepted on a socket configured with "accept-proxy"
    and the PROXY protocol is correctly used, then the logs will reflect the
    forwarded connection's information.

  - "client_port" is the TCP port of the client which initiated the connection.
    If the connection was accepted on a UNIX socket instead, the port would be
    replaced with the ID of the accepting socket, which is also reported in the
    stats interface.

  - "accept_date" is the exact date when the TCP connection was received by
    haproxy (which might be very slightly different from the date observed on
    the network if there was some queuing in the system's backlog). This is
    usually the same date which may appear in any upstream firewall's log. This
    does not depend on the fact that the client has sent the request or not.

  - "frontend_name" is the name of the frontend (or listener) which received
    and processed the connection.

  - "backend_name" is the name of the backend (or listener) which was selected
    to manage the connection to the server. This will be the same as the
    frontend if no switching rule has been applied.

  - "server_name" is the name of the last server to which the connection was
    sent, which might differ from the first one if there were connection errors
    and a redispatch occurred. Note that this server belongs to the backend
    which processed the request. If the request was aborted before reaching a
    server, "<NOSRV>" is indicated instead of a server name. If the request was
    intercepted by the stats subsystem, "<STATS>" is indicated instead.

  - "Tq" is the total time in milliseconds spent waiting for the client to send
    a full HTTP request, not counting data. It can be "-1" if the connection
    was aborted before a complete request could be received. It should always
    be very small because a request generally fits in one single packet. Large
    times here generally indicate network trouble between the client and
    haproxy. See "Timers" below for more details.

  - "Tw" is the total time in milliseconds spent waiting in the various queues.
    It can be "-1" if the connection was aborted before reaching the queue.
    See "Timers" below for more details.

  - "Tc" is the total time in milliseconds spent waiting for the connection to
    establish to the final server, including retries. It can be "-1" if the
    request was aborted before a connection could be established. See "Timers"
    below for more details.

  - "Tr" is the total time in milliseconds spent waiting for the server to send
    a full HTTP response, not counting data. It can be "-1" if the request was
    aborted before a complete response could be received. It generally matches
    the server's processing time for the request, though it may be altered by
    the amount of data sent by the client to the server. Large times here on
    "GET" requests generally indicate an overloaded server. See "Timers" below
    for more details.

  - "Tt" is the total time in milliseconds elapsed between the accept and the
    last close. It covers all possible processing. There is one exception, if
    "option logasap" was specified, then the time counting stops at the moment
    the log is emitted. In this case, a '+' sign is prepended before the value,
    indicating that the final one will be larger. See "Timers" below for more
    details.

  - "status_code" is the HTTP status code returned to the client. This status
    is generally set by the server, but it might also be set by haproxy when
    the server cannot be reached or when its response is blocked by haproxy.

  - "bytes_read" is the total number of bytes transmitted to the client when
    the log is emitted. This does include HTTP headers. If "option logasap" is
    specified, the this value will be prefixed with a '+' sign indicating that
    the final one may be larger. Please note that this value is a 64-bit
    counter, so log analysis tools must be able to handle it without
    overflowing.

  - "captured_request_cookie" is an optional "name=value" entry indicating that
    the client had this cookie in the request. The cookie name and its maximum
    length are defined by the "capture cookie" statement in the frontend
    configuration. The field is a single dash ('-') when the option is not
    set. Only one cookie may be captured, it is generally used to track session
    ID exchanges between a client and a server to detect session crossing
    between clients due to application bugs. For more details, please consult
    the section "Capturing HTTP headers and cookies" below.

  - "captured_response_cookie" is an optional "name=value" entry indicating
    that the server has returned a cookie with its response. The cookie name
    and its maximum length are defined by the "capture cookie" statement in the
    frontend configuration. The field is a single dash ('-') when the option is
    not set. Only one cookie may be captured, it is generally used to track
    session ID exchanges between a client and a server to detect session
    crossing between clients due to application bugs. For more details, please
    consult the section "Capturing HTTP headers and cookies" below.

  - "termination_state" is the condition the session was in when the session
    ended. This indicates the session state, which side caused the end of
    session to happen, for what reason (timeout, error, ...), just like in TCP
    logs, and information about persistence operations on cookies in the last
    two characters. The normal flags should begin with "--", indicating the
    session was closed by either end with no data remaining in buffers. See
    below "Session state at disconnection" for more details.

  - "actconn" is the total number of concurrent connections on the process when
    the session was logged. It is useful to detect when some per-process system
    limits have been reached. For instance, if actconn is close to 512 or 1024
    when multiple connection errors occur, chances are high that the system
    limits the process to use a maximum of 1024 file descriptors and that all
    of them are used. See section 3 "Global parameters" to find how to tune the
    system.

  - "feconn" is the total number of concurrent connections on the frontend when
    the session was logged. It is useful to estimate the amount of resource
    required to sustain high loads, and to detect when the frontend's "maxconn"
    has been reached. Most often when this value increases by huge jumps, it is
    because there is congestion on the backend servers, but sometimes it can be
    caused by a denial of service attack.

  - "beconn" is the total number of concurrent connections handled by the
    backend when the session was logged. It includes the total number of
    concurrent connections active on servers as well as the number of
    connections pending in queues. It is useful to estimate the amount of
    additional servers needed to support high loads for a given application.
    Most often when this value increases by huge jumps, it is because there is
    congestion on the backend servers, but sometimes it can be caused by a
    denial of service attack.

  - "srv_conn" is the total number of concurrent connections still active on
    the server when the session was logged. It can never exceed the server's
    configured "maxconn" parameter. If this value is very often close or equal
    to the server's "maxconn", it means that traffic regulation is involved a
    lot, meaning that either the server's maxconn value is too low, or that
    there aren't enough servers to process the load with an optimal response
    time. When only one of the server's "srv_conn" is high, it usually means
    that this server has some trouble causing the requests to take longer to be
    processed than on other servers.

  - "retries" is the number of connection retries experienced by this session
    when trying to connect to the server. It must normally be zero, unless a
    server is being stopped at the same moment the connection was attempted.
    Frequent retries generally indicate either a network problem between
    haproxy and the server, or a misconfigured system backlog on the server
    preventing new connections from being queued. This field may optionally be
    prefixed with a '+' sign, indicating that the session has experienced a
    redispatch after the maximal retry count has been reached on the initial
    server. In this case, the server name appearing in the log is the one the
    connection was redispatched to, and not the first one, though both may
    sometimes be the same in case of hashing for instance. So as a general rule
    of thumb, when a '+' is present in front of the retry count, this count
    should not be attributed to the logged server.

  - "srv_queue" is the total number of requests which were processed before
    this one in the server queue. It is zero when the request has not gone
    through the server queue. It makes it possible to estimate the approximate
    server's response time by dividing the time spent in queue by the number of
    requests in the queue. It is worth noting that if a session experiences a
    redispatch and passes through two server queues, their positions will be
    cumulated. A request should not pass through both the server queue and the
    backend queue unless a redispatch occurs.

  - "backend_queue" is the total number of requests which were processed before
    this one in the backend's global queue. It is zero when the request has not
    gone through the global queue. It makes it possible to estimate the average
    queue length, which easily translates into a number of missing servers when
    divided by a server's "maxconn" parameter. It is worth noting that if a
    session experiences a redispatch, it may pass twice in the backend's queue,
    and then both positions will be cumulated. A request should not pass
    through both the server queue and the backend queue unless a redispatch
    occurs.

  - "captured_request_headers" is a list of headers captured in the request due
    to the presence of the "capture request header" statement in the frontend.
    Multiple headers can be captured, they will be delimited by a vertical bar
    ('|'). When no capture is enabled, the braces do not appear, causing a
    shift of remaining fields. It is important to note that this field may
    contain spaces, and that using it requires a smarter log parser than when
    it's not used. Please consult the section "Capturing HTTP headers and
    cookies" below for more details.

  - "captured_response_headers" is a list of headers captured in the response
    due to the presence of the "capture response header" statement in the
    frontend. Multiple headers can be captured, they will be delimited by a
    vertical bar ('|'). When no capture is enabled, the braces do not appear,
    causing a shift of remaining fields. It is important to note that this
    field may contain spaces, and that using it requires a smarter log parser
    than when it's not used. Please consult the section "Capturing HTTP headers
    and cookies" below for more details.

  - "http_request" is the complete HTTP request line, including the method,
    request and HTTP version string. Non-printable characters are encoded (see
    below the section "Non-printable characters"). This is always the last
    field, and it is always delimited by quotes and is the only one which can
    contain quotes. If new fields are added to the log format, they will be
    added before this field. This field might be truncated if the request is
    huge and does not fit in the standard syslog buffer (1024 characters). This
    is the reason why this field must always remain the last one.

4、CLF HTTP协议格式:相当于http协议格式。

CLF HTTP协议格式字段定义串:
    log-format %{+Q}o\ %{-Q}ci\ -\ -\ [%T]\ %r\ %ST\ %B\ \"\"\ \"\"\ %cp\ \
               %ms\ %ft\ %b\ %s\ \%Tq\ %Tw\ %Tc\ %Tr\ %Tt\ %tsc\ %ac\ %fc\ \
               %bc\ %sc\ %rc\ %sq\ %bq\ %CC\ %CS\ \%hrl\ %hsl

5、自定义格式

R	var	field name (8.2.2 and 8.2.3 for description)	type
%o	special variable, apply flags on all next var	
%B	bytes_read (from server to client)	numeric
H	%CC	captured_request_cookie	string
H	%CS	captured_response_cookie	string
%H	hostname	string
H	%HM	HTTP method (ex: POST)	string
H	%HP	HTTP request URI without query string (path)	string
H	%HQ	HTTP request URI query string (ex: ?bar=baz)	string
H	%HU	HTTP request URI (ex: /foo?bar=baz)	string
H	%HV	HTTP version (ex: HTTP/1.0)	string
%ID	unique-id	string
%ST	status_code	numeric
%T	gmt_date_time	date
%Tc	Tc	numeric
%Tl	local_date_time	date
H	%Tq	Tq	numeric
H	%Tr	Tr	numeric
%Ts	timestamp	numeric
%Tt	Tt	numeric
%Tw	Tw	numeric
%U	bytes_uploaded (from client to server)	numeric
%ac	actconn	numeric
%b	backend_name	string
%bc	beconn (backend concurrent connections)	numeric
%bi	backend_source_ip (connecting address)	IP
%bp	backend_source_port (connecting address)	numeric
%bq	backend_queue	numeric
%ci	client_ip (accepted address)	IP
%cp	client_port (accepted address)	numeric
%f	frontend_name	string
%fc	feconn (frontend concurrent connections)	numeric
%fi	frontend_ip (accepting address)	IP
%fp	frontend_port (accepting address)	numeric
%ft	frontend_name_transport (‘~’ suffix for SSL)	string
%lc	frontend_log_counter	numeric
%hr	captured_request_headers default style	string
%hrl	captured_request_headers CLF style	string list
%hs	captured_response_headers default style	string
%hsl	captured_response_headers CLF style	string list
%ms	accept date milliseconds (left-padded with 0)	numeric
%pid	PID	numeric
H	%r	http_request	string
%rc	retries	numeric
%rt	request_counter (HTTP req or TCP session)	numeric
%s	server_name	string
%sc	srv_conn (server concurrent connections)	numeric
%si	server_IP (target address)	IP
%sp	server_port (target address)	numeric
%sq	srv_queue	numeric
S	%sslc	ssl_ciphers (ex: AES-SHA)	string
S	%sslv	ssl_version (ex: TLSv1)	string
%t	date_time (with millisecond resolution)	date
%ts	termination_state	string
H	%tsc	termination_state with cookie status	string
R = Restrictions : H = mode http only ; S = SSL only

Example :

global
        maxconn 65535
        chroot /usr/local/haproxy
        uid 99
        gid 99
        daemon
        nbproc 1
        description haproxy
        pidfile /var/run/haproxy.pid
defaults
        log global
        mode http
        balance roundrobin
        option forceclose
        option dontlognull
        option redispatch
        option abortonclose
        log-format %ci:%cp\ [%t]\ %U\ %HM\ %HU\ %HV\ %ST\ %si:%sp

>&gt;&gt; Sep 12 10:17:52 localhost haproxy[22909]: 10.1.250.98:53300 [12/Sep/2016:10:17:52.532] 496 GET / HTTP/1.1 200 10.1.1.20:9090

错误日志
error日志格式:

 >>> Dec  3 18:27:14 localhost \
          haproxy[6103]: 127.0.0.1:56059 [03/Dec/2012:17:35:10.380] frt/f1: \
          Connection error during SSL handshake

  Field   Format                                Extract from the example above
      1   process_name '[' pid ']:'                             haproxy[6103]:
      2   client_ip ':' client_port                            127.0.0.1:56059
      3   '[' accept_date ']'                       [03/Dec/2012:17:35:10.380]
      4   frontend_name "/" bind_name ":"                              frt/f1:
      5   message                        Connection error during SSL handshake

捕捉 HTTP Headers

Example:
capture request header Host len 15
capture request header X-Forwarded-For len 15
capture request header Referer len 15

自定义headers
frontend webapp
        bind *:80
        capture request header test len 20
        capture request header test2 len 20