Flume日常使用问题汇总与解决办法

本文汇总了我使用Flume以来,所遇见过的部分问题,并整理出这些问题的解决办法。

Closing file: *.tmp failed. Will retry again in 180 seconds.

Flume Collector运行时间长了之后,偶尔会发生HDFS-close异常,并且这些异常会持续不断出现在flume日志中,日志详情如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
27 Mar 2015 17:38:08,066 WARN  [hdfs-hdfs-access-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$4.call:387)  - Closing file: hdfs://.../1427344254600.lzo.tmp failed. Will retry again in 180 seconds.
java.nio.channels.ClosedChannelException
at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1317)
at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1630)
at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1590)
at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1575)
at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:121)
at org.apache.flume.sink.hdfs.HDFSCompressedDataStream.close(HDFSCompressedDataStream.java:149)
at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:341)
at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:335)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:718)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183)
at org.apache.flume.sink.hdfs.BucketWriter.access$1700(BucketWriter.java:59)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:715)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

这是由于Flume的BucketWriter试图关闭HDFSCompressedDataStream,但是关闭操作失败抛出异常,而Flume默认会每隔180秒无限重试关闭输出流,因此这个异常就会每隔180秒出现在日志中。

最浅显的解决办法就是阻止flume无限重试close,配置hdfs.closeTries为固定值,比如3次。

从深层角度来讲,close失败一般都是由于hdfs的namenode不稳定,write/close操作响应超时所致,flume默认的超时时间为10秒钟,可以配置hdfs.callTimeout更大一点,来降低改异常出现的概率。