Main Page

From FLTECH TWIKI
Jump to: navigation, search

Welcome to the Twiki Page of Florida Tech Tier-3 Cluster. Here you will find all the links to the previous years documentation:

Admin Manuals

Semester Logs

Fall 2015

Summer 2015

Spring 2015

Fall 2014

Summer 2014

Spring 2014

Fall 2013

Summer 2013

Spring 2013

Fall 2012

Spring 2012

Fall 2011

Spring 2011

Fall 2010

Summer 2010

Spring 2010

Fall 2009

Summer 2009

Spring 2009

Fall 2008

Fall 2007


Troubleshoot FAQs

  • SAM jobs are not coming to the SITE
  1. Check the Troubleshoot CE OSG page.
  2. Check if the firewall is allowing connections properly, run iptables -L -n. Check if the port 2119 (Gram CE) or 9619

(HTCondor CE) is opened

  1. Check if the IT dept has changed it's firewall settings
  2. Check if the SAM User (grid0002 or whatever account it maps to has proper ownerships of the directory in /var/lib/globus/gram_state)
  • GUMs page is down (cannot access the GUMs page from the browser)


  • RSV test flips between OK and critical
  • VOPUT SAM SE test flips between ok and critical
  • Ownership of the files is shown as nobody
  • Cannot access home directories
Check if the user has entries in /etc/auto.home, add the entry and then reload and restart autofs
service autofs reload
service autofs restart

  • Rocks re-installs OS after restarting the nodes/CE/SE
  1. Run the following command from the CE
tentakel service rocks-grub stop
tentakel chkconfig rocks-grub off
  • SAM test shows jobs running as nobody (Condor runs as nobody, and will have problems accessing certain directories/files)
  • Condor is not running on the nodes
  • SAM Glexec test fails with the error:
[gLExec]:  LCMAPS failed.
The reason can be found in the syslog.
Error: error 203 executing /usr/sbin/glexec getting payload uid

Solution: Run fetch-crl on all the nodes
  • Gums breaks after update:
Solution: This happens mostly because the links get broken after the update (mostly because rocks repo has 3.X version of antlr and Centos has 2.X).
          In a terminal type (if you want to keep using 2.X):
          ls -l /usr/lib/gums/[antlr].jar /var/lib/tomcat6/webapps/gums/WEB-INF/lib/[antlr].jar
          They should show:    
          lrwxrwxrwx 1 root root 25 Sep  1 20:33 /usr/lib/gums/[antlr].jar -> /usr/share/java/antlr.jar
          lrwxrwxrwx 1 root root 25 Sep  1 20:33 /var/lib/tomcat6/webapps/gums/WEB-INF/lib/[antlr].jar -> /usr/share/java/antlr.jar

Useful Tricks/Tips

  • Checking and killing held jobs condor
  1. To check which jobs are held, run
condor_q -const "jobstatus==5"
  1. To kill them, run
condor_rm -const "jobstatus==5"
  • Name of the machines where the jobs are running
  1. condor_q -run
  1. condor_history -userlog <file.log>: list basic information registered in the log files (use condor_logview <file.log> to see information in graphic mode)
  2. condor_history -long XXX.YYY | grep LastRemoteHost: show machine where job XXX.YYY was executed
  3. condor_history -constraint 'RemoveReason=!=UNDEFINED': show your jobs that were removed before completion (you can use other constraints)

Settting up your analysis (CMS) environment

BASH

From the terminal type: (bash)

source /cmssoft/cms/cmsset_default.sh

Then go to the src directory in the CMS release directory (say CMSSW_6_2_5/src) and do

cmsenv

To run crab jobs, you need to source the crab.sh file

source /cmssoft/cms/crab/crab.sh

TCSH

From the terminal type: (tcsh)

source /cmssoft/cms/cmsset_default.csh

Then go to the src directory in the CMS release directory (say CMSSW_6_2_5/src) and do

cmsenv

To run crab jobs, you need to source the crab.sh file

source /cmssoft/cms/crab/crab.csh

Most of the time, the analysis files/root files will be stored in the UF site, to list the contents of your directory (contact bockjoo kim for getting an account)

lcg-ls -b --vo cms -D srmv2 -T srmv2 -v srm://srm.ihepa.ufl.edu:8443/srm/v2/server?SFN=/cms/data/store/user/<user-name>/

To copy files one can use lcg-cp commands

lcg-cp -b --vo cms -D srmv2 srm://srm.ihepa.ufl.edu:8443/srm/v2/server?SFN=<file-name with directory path> file:<destination-path>

Consult the User's Guide for information on using the wiki software.