Saturday 31 March 2018

History Server Permissions and Ownership Guide

This is a guide to resolving JobHistory Server (JHS) and SparkHistory Server (SHS) problems related to file ownership and permissions.

Description:::
Users are not able to see the jobs from JobHistoryServer (JHS). It looks like the jobs are not moved from ResourceManager(RM) to JHS.

Issue :::
When clicking on the 'History' link of a job within RM WebUI, an error message is reported:

"Error getting logs at <hostname-node1>:8041"

Notice that the MapReduce (MR) job history metadata, e.g. conf.xml and *.jhist files, are not being moved from HDFS location /user/history/done_intermediate to /user/history/done. An HDFS listing of /user/history/done_intermediate/<username> shows many old files still present in this location.

Also, from JHS log, the following error may be present:

""
ERROR org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Error while trying to scan the directory
hdfs://host1.example.com:8020/user/history/done_intermediate/userA
org.apache.hadoop.security.AccessControlException: Permission denied: user=userB,
access=READ_EXECUTE, inode="/user/history/done_intermediate/userA":userA:hadoop:drwxrwx---
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:151)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
.......
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
""


Reason:::
The cause of this issue is because the "mapred" user is not in the "hadoop" group.

Therefore, the metadata files in /user/history/done_intermediate/<username> cannot be read by the mapred user as these directories and files are owned by <username>:hadoop with permission mode 770.
The mapred user needs to be in the hadoop group in order to read those metadata files and move them to /user/history/done/.

Resolution::
to resolve the issue, add the mapred user to the hadoop group. Depending on how you define your user and group, you may either need to do that from Linux side or from LDAP/AD side.
On a basic Linux system you could add the group to the user's list of groups using the "usermod" command:
" sudo usermod -a -G hadoop mapred "

Additional information:::

YARN applications use different history mechanisms depending on the application type. The two most common application types are MapReduce (MR) and Spark. The JobHistory Server (JHS) is responsible for managing MR application history and the SparkHistory Server (SHS) is responsible for Spark history.

These application history mechanisms have a couple things common:
Job summary files contain metadata such as performance counters and configuration
Container logs contain the direct output of the application as it ran in the YARN container
When log aggregation is enabled for YARN, which is the default, the NodeManager (NM) is responsible for moving the container logs into HDFS when the job is complete (failed or not).
Each user has their own directory in HDFS for container logs.  While the job is running these container logs are kept in the NM local directories, local to the node.  The NM moves the container logs to /tmp/logs/<username/logs/<application_id>. 
The log files will be owned by the user who ran the job, and the group ownership will be "hadoop".
The group is used to allow services such as the history server processes to read the files.

Additionally there is a location for metadata about each job. Then the JHS needs to be able to read and sometimes modify these files from their HDFS locations.

1 comment: