Skip to content

charngda/tacc_stats

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tacc_stats                                             John L. Hammond
                                                           Jun 14 2011

tacc_stats is a job-oriented and logically structured version of the
conventional sysstat system monitor.  

1. Invocation

On each Ranger compute node tacc_stats runs every 10-minutes (through
cron), and at the beginning and end of every job (through SGE
prolog/epilog).  In addition, tacc_stats may be directly invoked by
the user (or application) although we have not advertised this.

WARNING Lack of coordination.

2. Data Handling

On each invocation, tacc_stats collects and records system statistics
to a structured text file on ram backed storage local to the node.
Stats files are rotated at every night at 23:55 localtime, and
archived at sometime between 02:00--04:00 localtime to Ranger's
/scratch filesystem.  A stats file created at epoch time EPOCH, on
node HOSTNAME, will be stored locally as /var/log/tacc_stats/EPOCH,
and archived at
/scratch/projects/tacc_stats/archive/HOSTNAME/EPOCH.gz.  For example
stats collected on Jun 14 2011 on i101-101, might correspond to files
/var/log/tacc_stats/1308027601 and
/scratch/projects/tacc_stats/archive/i101-101.ranger.tacc.utexas.edu/1308027601.gz.

WARNING Do not expect all stats files to be created at midnight
exactly, or even approximately.  As nodes are rebooted, new
stats_files will be created as soon as a job begins or the cron task
runs.

WARNING Stats from a given job on a give host may span multiple files.

WARNING Expect stats files to be missing occasionally, as nodes may
crash before they can be archived.  Since we use ram backed storage
these files do not survive a reboot.

3. Stats File Format

A stats file consists of a multiline header, followed my one or more
record groups.  The first few lines of the header identify the version
of tacc_stats, the FQDN of the host, it's uname, it's uptime in seconds, and
other properties to be specified.

  $tacc_stats 1.0.2
  $hostname i101-101.ranger.tacc.utexas.edu
  $uname Linux x86_64 2.6.18-194.32.1.el5_TACC #18 SMP Mon Mar 14 22:24:19 CDT 2011
  $uptime 4753669

These are followed by schema descriptors for each of the types collected:

  !amd64_pmc CTL0,C CTL1,C CTL2,C CTL3,C CTR0,E,W=48 CTR1,E,W=48 CTR2,E,W=48 CTR3,E,W=48
  !cpu user,E,U=cs nice,E,U=cs system,E,U=cs idle,E,U=cs iowait,E,U=cs irq,E,U=cs softirq,E,U=cs
  !lnet tx_msgs,E rx_msgs,E rx_msgs_dropped,E tx_bytes,E,U=B rx_bytes,E,U=B rx_bytes_dropped,E
  !ps ctxt,E processes,E load_1 load_5 load_15 nr_running nr_threads
  ...

A schema descriptor consists of the character '!' followed by the
type, followed by a space separated list of elements.  Each element
consists of a key name, followed by a comma-separated list of options;
the options currently used are:
  E meaning that the counter is an event counter,
  W=<BITS> meaning that the counter is <BITS> wide (as opposed to 64),
  C meaning that the value is a control register, not a counter,
  U=<STR> meaning that the value is in units specified by <STR>.

Note especially the event and width options.  Certain counters, such
as the performance counters are subject to rollover, and as such their
widths must be known for the values to be interpreted correctly.

WARNING The archived stats files do not account for rollover.  This
task is left for postprocessing.

A record group consists of a blank line, a line containing the epoch
time of the record and the current jobid, zero of more lines of marks
(each starting with the % character), and several lines of statistics.


  1307509201 1981063
  %begin 1981063
  amd64_pmc 11 4259958 4391234 4423427 4405240 235835341001110 187269740525248 62227761639015 177902917871843
  amd64_pmc 10 4259958 4391234 4405239 4423427 221601328309784 187292967300939 47879507215852 174113618669738
  amd64_pmc 13 4259958 4405238 4391234 4423427 211997466129346 215850892876689 2218837366391 233806061617899
  amd64_pmc 12 4392928 4259958 4391234 4423427 6782043270201 102683296940807 2584394368284 174209034378272
  ...
  cpu 11 429720418 0 1685980 43516346 447875 155 3443
  cpu 10 429988676 0 1675476 43150935 559410 8 283
  ...
  net ib0 0 0 55915434547 0 0 0 0 0 0 0 0 0 159301288 0 46963995550 0 0 97 0 0 0 31404022 0
  ...
  ps - 4059349377 507410 1600 1600 1600 18 373
  ...

Each line of statistics contains the type (amd64_pmc, cpu, net,
ps,...), the device (11,10,13,12,...,ib0,-...), followed by the
counter values in the order given by the schema.  Note that when we
cannot meaningfully attach statistics to a device, we use '-' as the
device name.

4. Types

The types currently collected on Ranger are: 

  amd64_pmc  AMD Opteron performance counters (per core),
  block      block device statistics (per device),
  cpu        scheduler accounting (per CPU),
  ib_sw      InfiniBand usage,
  llite      Lustre filesystem usage (per mount),
  lnet       Lustre network usage,
  mem        memory usage (per socket)
  net        network device usage (per device)
  numa       weird NUMA statistics (per socket),
  ps         process statistics,
  sysv_shm   SysV shared memory segment usage,
  tmpfs      ram-backed filesystem usage (per mount),
  vfs        dentry/file/inode cache usage,
  vm         virtual memory statistics.

For the keys associated with each type, see the appropriate schema.
For the source and meanings of the counters, see the tacc_stats source
https://github.com/jhammond/tacc_stats, the CentOS 5.6 kernel source,
especially Documentation/*, and the manpages, especially proc(5).

I have not tracked down the meanings of all counters.  However, if I
did (and it wasn't obvious from the counter name) then I put that
information in the source (see for example block.c).

WARNING Due to a bug in Lustre, llite overreports read_bytes.

WARNING Some event counters (from ib_sw, numa, and possibly others)
suffer from occasional dips.  This may be due to non-atomic accesses
in the (kernel) code that presents the counter, a bug in tacc_stats,
or some other condition.  Spurious rollover is easy to detect,
however, because a naive adjustment produced a riduculously large
delta.

WARNING We never reset counters, thus to determine the number of
events that occurred during a job, you must subtract the value at
begin from end.

WARNING Due to a quirk in the Opteron performance counter
architecture, we do not assign the same set of events to each core,
see amd64_pmc.c in the tacc_stats source for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 79.6%
  • Python 15.0%
  • Shell 5.4%