Saturday, 31 March 2018

History Server Permissions and Ownership Guide

This is a guide to resolving JobHistory Server (JHS) and SparkHistory Server (SHS) problems related to file ownership and permissions.

Description:::
Users are not able to see the jobs from JobHistoryServer (JHS). It looks like the jobs are not moved from ResourceManager(RM) to JHS.

Issue :::
When clicking on the 'History' link of a job within RM WebUI, an error message is reported:

"Error getting logs at <hostname-node1>:8041"

Notice that the MapReduce (MR) job history metadata, e.g. conf.xml and *.jhist files, are not being moved from HDFS location /user/history/done_intermediate to /user/history/done. An HDFS listing of /user/history/done_intermediate/<username> shows many old files still present in this location.

Also, from JHS log, the following error may be present:

""
ERROR org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Error while trying to scan the directory
hdfs://host1.example.com:8020/user/history/done_intermediate/userA
org.apache.hadoop.security.AccessControlException: Permission denied: user=userB,
access=READ_EXECUTE, inode="/user/history/done_intermediate/userA":userA:hadoop:drwxrwx---
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:151)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
.......
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
""


Reason:::
The cause of this issue is because the "mapred" user is not in the "hadoop" group.

Therefore, the metadata files in /user/history/done_intermediate/<username> cannot be read by the mapred user as these directories and files are owned by <username>:hadoop with permission mode 770.
The mapred user needs to be in the hadoop group in order to read those metadata files and move them to /user/history/done/.

Resolution::
to resolve the issue, add the mapred user to the hadoop group. Depending on how you define your user and group, you may either need to do that from Linux side or from LDAP/AD side.
On a basic Linux system you could add the group to the user's list of groups using the "usermod" command:
" sudo usermod -a -G hadoop mapred "

Additional information:::

YARN applications use different history mechanisms depending on the application type. The two most common application types are MapReduce (MR) and Spark. The JobHistory Server (JHS) is responsible for managing MR application history and the SparkHistory Server (SHS) is responsible for Spark history.

These application history mechanisms have a couple things common:
Job summary files contain metadata such as performance counters and configuration
Container logs contain the direct output of the application as it ran in the YARN container
When log aggregation is enabled for YARN, which is the default, the NodeManager (NM) is responsible for moving the container logs into HDFS when the job is complete (failed or not).
Each user has their own directory in HDFS for container logs.  While the job is running these container logs are kept in the NM local directories, local to the node.  The NM moves the container logs to /tmp/logs/<username/logs/<application_id>. 
The log files will be owned by the user who ran the job, and the group ownership will be "hadoop".
The group is used to allow services such as the history server processes to read the files.

Additionally there is a location for metadata about each job. Then the JHS needs to be able to read and sometimes modify these files from their HDFS locations.

How to enable debug level logging for YARN and its dependent services

Summery: How to enable debug level logging for YARN and its dependent services without restart.
Explanation: 
you may not be able to identify issue when INFO level logging is enabled while troubleshooting issue. in-order to see more granular logging we  have to enable DEBUG mode logging for different YARN services.
         Below steps helps you to configure this in Production environments without YARN service restart.

Steps:
1)  Open 'resource manager'  web interface.
Ex: http://<your rm ip>:8080  (you should see screen similar to below)








2) Now open logging configuration web page from RM web page.
ex: http://<your rm ip>:8080/logLevel ( you should see below web page)
(take a note uppercase L in URL).




 







3) Now decide, for which log class you want to change logging level.
Ex: i am choosing 'RMAuditLogger'.

4) Check what is the current logging level for chosen class.(check screen shot)












5) Now, change the logging level for chosen classes to desired logging level (in this case INFO to DEBUG)





























6) Review the Resource Manager service log. After the change, you should see the DEBUG level log messages in the logs.

Enabling Debugging mode of logging - Hue

Up to CDH-5.6 follow below steps in Cloudera Manager:

1) Hue --> Configuration.
2) search for " Hue Service Environment Advanced Configuration Snippet (Safety Valve)​ " and add below properties.
###

DEBUG=true
DESKTOP_DEBUG=true

 ###

3) Save and Restart Hue service.
4) you should be able to collect logs physically navigating into Hue service installed linux machine under below path.
" /var/run/cloudera-scm-agent/process/<id>-hue-HUE_SERVER/logs "
( where <id> is the most recently created, the debug logs do not show up in the default Hue log location, only in the process directory mentioned above.)

From CDH-5.7 on-wards :

Below are two ways to enable DEBUG messages for all the logs in /var/log/hue :

1 ) Cloudera Manager:
    - Go to Hue --> Configuration --> check Enable Django Debug Mode --> and Save Changes --> Restart Hue.

2) Hue Web UI:
    - Go to the Home page --> select Server Logs --> check Force Debug Level. Debug is enabled on-the-fly.