Changes from 4.1.2p1 -> 4.2 (10 Aug 2006) ----------------------------------------- New features: * A major overhaul of the "Critical Systems" (NK) webpage. A new hobbit-nkview CGI has been added, allowing much more flexible handling of the NK alerts. This allows the 24x7 monitoring staff to group alerts by priority, filter out acknowledged alerts that have been delegated to resolver groups, to permanently enable or disable a status to appear on their monitoring console when a system goes into or is taken out of production, and also provides direct acces to special instructions for the monitoring staff. To accomodate all of these new configuration items, a separate "Critical Systems Configuration" file is introduced, which is separate from the bb-hosts file. A web-based configuration tool - the "hobbit-nkedit" CGI - is also provided, allowing the monitoring operations staff to control what systems appear and how their monitoring is setup. Since hobbit-nkview is a CGI script, the Critical Systems page will now always show a completely up-to-date version of the Hobbit systems status. * The Hobbit client will now report logfile data as part of the client data. Which logfiles to monitor and how much data to send is configured centrally on the Hobbit server, and automatically transmitted to the client when it contacts the Hobbit server. See the logfetch(1) man-page for details. Apart from size limitations, the logfile data is not filtered by the client. * Two new tools - hobbitfetch and msgcache - can be used to implement a "pull" style of clients, which may be useful if you have systems that cannot make network connections to the Hobbit server to deliver their data. See the hobbitfetch(8) man-page for details. * Disabling a status can now be done until the status goes to an OK state. * A "bulletin_header" or "bulletin_footer" file can now be created in the ~hobbit/server/web/ direectory. This will automatically get added to the header or footer of all webpages. * All configuration files now support the use of "include" statements to split the configuration into several files. * All configuration files now support the statement "directory DIRNAME"; this causes all normal files in this directory and below to be included as part of the configuration file. Note that the sequence of files being included in this way is not controllable. * An acknowledgment sent to Hobbit is only deleted after the status has been OK for a while (12 minutes). This allows a status to be acknowledged, then briefly go OK but return to critical shortly after, without resetting the alert timers. * A new "files" status column uses data from the client to monitor file- and directory attributes and sizes. This can be fed into graphs, so you can track the sizes of individual files or directories. * A new "ports" status column uses data from the client to monitor network ports. This can be fed into graphs, so you can track the number of connections to a given service, from a specific IP-address, or in a particular state. This is based on code provided by Mirko Saam. * The Hobbit clients now report network interface statistics. This allows for Hobbit to track network utilisation, like MRTG does. Note that due to limitations in many client operating systems, and the fact that Hobbit only records network statistics every 5 minutes, this is currently limited to approximately 100 Mbit/sec interfaces (Fast Ethernet). Gigabit interfaces are not handled correctly and will show up with much smaller bandwidth usage than what they actually do use. * When a host status goes critical, any client data from the host is saved. This allows detailed analysis of the host status just prior to a critical event happening, which can be helpful for troubleshooting. Improvements: * All of the core programs have been profiled to identify performance bottlenecks. Several optimizations have been implemented; these are definitely noticable when Hobbit is used in large installations (more than 100 hosts). The most noticable effects is a drop of 25-40% in hobbitd CPU utilisation, a 50% drop in CPU utilisation of the hobbitd_client module, and a huge speedup in the hobbit-enadis CGI program. * DOWNTIME can now be applied to individual tests, and you can specify a text explaining why the service is down. The new format in the bb-hosts file is: DOWNTIME=columns:days:starttime:endtime:cause If you have more than one service that you want to set the downtime for, "column" can be "*" to match all services, or you can define multiple downtime-settings: "DOWNTIME=http,ftp:*:0300:0315:CMS update;*:0:0200:0210:Reboot" * Status changes that happen during DOWNTIME periods do not update the last-status-change timestamp in hobbitd. Hence a service that was red before the start of DOWNTIME and stays red during downtime will not appear to be changed recently - so it wont reappear on the NK view when using time-based filters. * A new hobbit-statusreport CGI has been added. This CGI can generate an HTML report of all hosts with a given status column, e.g. all SSL certificates. By using filters, you can pick out those hosts where e.g. the status is non-green. A sample hobbit-certreport.sh script is included that uses this CGI to generate a report of all SSL certificates which are about to expire, i.e. have a red or yellow status. * When using alerts with FORMAT=SMS, the text will now clearly show when the status has RECOVERED. * The hobbitsvc CGI no longer needs to load the bb-hosts file to get information about the hosts' IP-address and display- name. Instead, it gets them from hobbitd as part of the normal request to get the current status. This avoids a lot of file I/O when looking at the detailed status page. * You can now explicitly choose which colors cause a status to appear on the BB2 page, via the "--bb2-colors=COLOR,COLOR" option to bbgen. * When zooming a graph, the legend now states the period that the graph covers. * bb-findhost now includes JavaScript code to make input focus go to the input field immediately. * SSH testing now sends an SSH version string. This should eliminate some logwarnings (note: requires updating of the bb-services file). * All BLUE status handling is now done by hobbitd. As a result, the detailed status shown while a status is BLUE will include the original status message color in the status text, even though the status will show up as blue. * bbhostgrep now support a "--no-down" option to omit hosts that are currently known to be down. This is determined by the current "conn" status. * The hobbit-mailack tool now recognizes ACK codes if given as a line "ack NUMBER" inside the mail body. Also, the duration of the acknowledge can be given through a "delay MINUTES" line in the message body. * The hobbitd client module can now run in a "local" mode, allowing for configurations to be maintained on individual servers. * Historical status logs generated by planned downtime settings now include a notice in the stored status log that the status was blue due to planned downtime. * For the NCV RRD handler, you can now set all datasets to a specific type by listing them as "NCV_foo=*:GAUGE". * The "bbdigest" utility no longer uses the OpenSSL library, and has been included in the client installation. * The "procs" (and "msgs" and "ports") status messages can now be configured to show an understandable legend for each of the checks, instead of the incomprehensible regular expression used to match the process listing. * Individual proces counts (matched through a PROC entry in hobbit-clients.cfg) can be tracked in a graph. * The event-log report can now select hosts based on the page they are on. Bugs: * The hobbitreports.sh script would calculate the wrong year and week-number for the end-of-year weekly reports. * hobbitd_client ignored qualifiers for individual rules. * The "query" command would return the wrong color for a disabled test. * The ncv handler would fail on input lines that had extra text after the value, e.g. "Temp: 23 degrees". * Memory reports from Mac OS X (Darwin) clients were ignored. Other: * The "NK" and "NKTIME" tags in the bb-hosts file have been deprecated with the introduction of the hobbit-nkview CGI. Similarly, the static NK webpage generated from these tags is now deprecated. They will continue to work for some time, but support for these will be removed in a future release of Hobbit. * The acknowledgment system has been redesigned, to allow for acks to happen at multiple levels of operation. E.g. the 24x7 monitoring staff may acknowledge an alert after raising a trouble-ticket, and a technician may acknowledge the same alert when responding to the ticket. This is currently used only by the new Critical Systems page, but a future release will use this for more advanced handling of alerts. * The HTML "Content-type" generated by the Hobbit CGI's can be configured through the HTMLCONTENTTYPE setting. This may be useful for sites where documentation embedded in the Hobbit webpages is in a different character-set, e.g. UTF-8 or Japanese. Changes from 4.1.2p1 -> 4.1.2p2 (02 Aug 2006) --------------------------------------------- Bugfixes: * [SECURITY] "config" commands would allow remote reading of any file accessible by the hobbit user. * Hosts were showing up twice on alternate pageset pages * Several programs would crash if a configuration file had a newline at exactly 4 KB offset into the file * Possible segfault in bbgen due to uninitialised variable. * Malformed history files could cause errors in reporting. * "query" of a disabled status would report the wrong color. Changes from 4.1.2 -> 4.1.2p1 (10 Nov 2005) ------------------------------------------- Bugfixes: * hobbitd could crash when processing a "combo" message, due to an incorrect test of the color of one message inside the combo-message before that color was actually deciphered. * hobbitd_alert would crash when attempting to save a checkpoint while cleaning up a "dead" alert. * Disk graphs would show up with all graphs in a single image, when the status message included any colored icons (typically, when the status was red or yellow). * The handling of alerts was counting the duration of an event based on when the color last changed. This meant that each time the color changed, any DURATION counters were reset. This would cause alerts to not go out if a status was changing between yellow and red faster than any DURATION setting. Changed this to count the event start as the *first* time the status went into an alert state (yellow or red, usually). * An idle hobbitd process would accumulate zombie processes. Improvements: * When a status goes yellow->red, the repeat-interval is now cleared for any alerts. This makes sure you get an alert immediately for the most severe state seen. This only affects the first such transition; if the status later changes between yellow/red, this normal REPEAT interval applies. Changes from 4.1.1 -> 4.1.2 (11 Oct 2005) ----------------------------------------- NOTE: If you are upgrading from 4.1.1, you MUST change the ~hobbit/server/etc/hobbit-clients.cfg file and add a line with the text "DEFAULT" before the default settings in the file. Post-RC1 fixes: * Linux client now runs ps with "-w" to get more of the commandline. * Disk status reports from clients are now processed with sed to make sure each mounted filesystem appears on a single line. * A missing "req." text in the "procs" status report was fixed. * Web checks now recognize status "100" as OK. * All of the build scripts using "head -1" now use the POSIX'ly correct "head -n 1" instead. * Documentation update: We do have a client now; the hobbit-clients.cfg man-page was added. Bugfixes: * The hobbit module handling client reports had several bugs in the way it interpreted the hobbit-clients.cfg file. These were fixed, but this necessitated that the default settings are explicitly flagged with a "DEFAULT" line. * When multiple recipients of an alert had different minimum duration and/or repeat-settings, they would mostly use only the settings for the first recipient. * The alert module could continue sending alerts even though the rules in hobbit-alerts.cfg that triggered the alert had been modified. * The hobbitd daemon would leak memory when responding to a "query" request. In extreme circumstances, it could also crash. * The hobbitd_alert module could leak tiny amounts of memory. * The default timeouts on the server- and client-side have been increased to allow some more time to send in status messages. Because of Hobbit's more agressive use of "combo" messages and larger amounts of data, it could timeout prematurely on busy servers. * Hosts flagged with "nobb2" no longer appear in the acknowledge log on the BB2 page. * The size of the shared-memory buffers used to pass data between the core hobbit daemon and the hobbitd_* modules has been increased, allowing for larger messages. These are now also configurable, so you can change them without having to recompile Hobbit (see the MAXMSG_* settings in hobbitserver.cfg(5)). * bbproxy: When sending to multiple servers and the connection to one server fails, continue feeding a message to the other servers instead of dropping it completely. * bbproxy: Rotating the logs will now also rotate the logfile for any debugging output. * The RSS files now escape special characters, to make sure the file has a valid RSS XML syntax. * The "cpu" status from a Hobbit client with a critical load would include texts showing both that the load was "high" and "critical". Changed to include only the most severe condition. * The "ncv" (name-colon-value) data handler could easily read past end of input, and mis-interpret the input data. Fixed by Matti Klock. Build fixes: * An "autoconf" type of script is now used to check for some of the more common build problems. This should make the client build without problems on more platforms. * The "configure" script will now recognize the "hobbit" user on systems using NIS or NIS+. * "make install" would fail to set the proper permissions for the Hobbit client directories. * The pre-built Debian- and RPM-packages failed to flag the client configuration files as such, so they would be overwritten by an upgrade. Note that this fix only takes effect AFTER you have installed the new .deb or .rpm files. * The pre-built Debian- and RPM-packages failed to set permissions on the client logs/ and tmp/ directories. * A separate "hobbit-client" package is now generated in both .rpm and .deb format. * A bug in the "merge-sects" installation utility has been fixed. This could cause the installation to abort prematurely. Server improvements: * The acknowledge log on the BB 2 page now has a configurable max number and max time of the entries listed, via the --max-ackcount and --max-acktime options for bbgen. * The eventlog script now lets you filter out events based on the hostname, testname, color, and start/stop times. Thanks to Eric Schwimmer for contributing this. * The "netstat" statistics module has been upgraded to collect byte-counter statistics for HP-UX, AIX, OSF/1 and *BSD. This might only work with the Hobbit client, not the BB/LARRD bf-netstat module. * A "group-except" definition is now supported in the bb-hosts file, to show all tests EXCEPT certain ones for a group of hosts. * A "noclear" tag has been added. This is the equivalent of defining all network test with the '~' - "always report true status" - flag. * RPC tests now only run if the ping check of the server did not fail. * The default size of the RRD graphs can be defined via the RRDHEIGHT and RRDWIDTH settings in hobbitserver.cfg. * The hobbitgraph CGI can now generate comparison graphs with e.g. the load graphs from multiple hosts. However, a front-end tool for requsting these has not yet been created. * Stale RRD files (.rrd files that haven't been updated for more than 1 day) are no longer included when generating the graphs on the status pages. They still show up if you view the graphs on the "trends" status column. * The hobbit-confreport CGI is now installed correctly, including an item in the "Reports" menu. * The "info" column now include the "uname" data from the Hobbit client, allowing you to see what operating system is running on the host. Client improvements: * The "runclient.sh" script now accepts two commandline options: "--hostname=CLIENT.HOST.NAME" lets you override the default hostname that the client uses when it reports data to Hobbit; "--os=OSNAME" lets you override the operating system name for certain Linux systems. See the README.CLIENT file. * The client is now re-locatable. I.e. you can pack up the "client" directory and move it to another box or another location for easier deployment of the client on multiple boxes running the same operating system. * Client installations on NIS-based systems should now work. * AIX is now supported - has been tested with 4.3.3 and 5.x. * OSF/1 is now supported (4.x and 5.x) * HP-UX support should work * Darwin / Mac OS X is now supported. * {Free,Net,Open}BSD-based systems failed to build the meminfo tool, so memory reporting was broken. * Solaris clients now reports all types of filesystems (notably "vxfs" filesystems were omitted before). * The client configuration now lets you check for filesystems that MUST be mounted. * The clients now switch locale to use the POSIX locale - previously the system locale was used, which could result in status messages in languages that the backend did not expect. Other: * A new utility "demotool" can be used to simulate a number of servers to Hobbit. This may be useful when demonstrating Hobbit to new users. Note: This is not included in the default build - to build it, run "make demo-build". Changes from 4.1.0 -> 4.1.1 (25 Jul 2005) ----------------------------------------- Bugfixes: * The Hobbit client mis-interpreted the "df" output from filesystems with longer-than-usual device names (e.g. network-mounted filesystems, resulting in some rather incredible values for the disk utilisation. * hobbitsvc.cgi could crash if some of the input parameters from the URL were missing. This would only happen if you accessed it via a URL that was not created by Hobbit. * The hobbitlaunch.cfg file for the server was missing a line, causing it to run the local client much more often than intended. * A faulty initialisation when reloading the Hobbit daemon state could leave a broken pointer in a log-record. If this was then accessed by a "hobbitdxboard" or a "drop" command, hobbitd would crash. Build problems: * The "configure" script failed on certain systems with a "cannot shift" error. * Building the client on systems without the PCRE headers would fail. * Building Hobbit - client or server - would fail on a system if the PCRE headers were in a non-standard location. Changes from 4.0.4 -> 4.1.0 (24 Jul 2005) ----------------------------------------- A Hobbit client for Unix systems has been implemented, and this was found important enough to warrant bumping the version number to 4.1. The README.CLIENT file has the details on how to use it. The client is automatically installed as part of a server installation. Server bugfixes: * [SECURITY] The Hobbit daemon (hobbitd) could crash when processing certain types of messages. It is believed that this could only be used for a denial-of-service attack against Hobbit, although it cannot completely be ruled out that an attacker might be able to exploit it to run arbitrary code with the privileges of the hobbit user. Thanks to Vernon Everett and Stefan Loos for their efforts in helping me track down these bugs. * Workaround a bug in KHTML based browsers (KDE's Konqueror, Mac OS X Safari) when generating reports: They cannot handle "multipart/mixed" documents, but only offer to save the document instead of sending you off to the report URL. * Fix a build problem on OpenBSD: Apparently OpenBSD's linker does not recognize the --rpath option. * A memory leak in the Hobbit daemon has been fixed (it would leak memory upon each reload of the bb-hosts file, which is done every 5 minutes). * Status messages using "&green" or another color in the first line of the status message would display the "&green" text instead of the color GIF image. * bbtest-net's collection of DNS responses has been delayed until an actual test is queued. Previously, a host with a "testip" flag could end up with a DNS lookup which doesn't really make sense. * Handling of the "notrends" tag was broken. * The duration string should no longer be included in the webpage showing a disabled test. (Only applies to tests disabled after installing Hobbit 4.0.5). * bbtest-net now reports "Hobbit" in the User-Agent header of all web requests, instead of "BigBrother". * If an alert was configured to be sent only during certain periods of time, the recovery message would be suppressed if the recovery happened outside of the alerting period. Changed so that recovery messages ignore the time-based restrictions. * hobbit-mailack would generate ack's valid for 30 minutes, instead of the documented 60 minutes. Changed to use 60 minutes. * An off-by-one error in the routine generating the HTML document headers and footers was caught by Valgrind. * A number of minor documentation fixes. * Memory reports from Win32 clients using the Big Brother client could trigger an overflow when calculating the memory usage, resulting in memory utilization being reported as 0. Changed to use a larger internal representation for the memory sizes. Server improvements: * A new reporting tool, hobbit-confreport.cgi, provides a way of generating a printable report summarizing the Hobbit monitoring configuration for a single server or a group of servers. * If a "custom" directory exists, you can have custom Hobbit tools located there and have them built during the normal build proces. * A status handed off to the hobbitd_alert module, but for which there is no alert recipient configured, would be re-checked every minute causing a heavy spike in the CPU load if there were many such statuses. A small code change allows us to skip these until the configuration file changes. * The code handling lookups of data from the bb-hosts file was changed to access the data via a tree-based search instead of a linear search. On large systems this provides a much more efficient retrieval of these data, reducing the overall load of Hobbit. * The internal representation of status-data inside the hobbitd daemon now uses a more efficient tree-structure instead of a simple linked list. * The NETFAILTEXT environment variable can be used to change the "not OK" text added to status messages of failed network tests. * External commands used in network testing (ntpdate, rpcinfo, traceroute) now have max. 30 seconds to complete. This is to avoid a broken ntpdate or similar to lock up the network tests. The "--cmdtimeout=N" option controls the length of the timeout. * hobbitlaunch no longer logs every task started to the hobbitlaunch.log file - this could result in the log file growing to huge proportions. The "--verbose" option for hobbitlaunch will restore the old behaviour, if needed. * A number of arbitrary limits on the size of various buffers, messages, queries and responses have been removed. Hobbit will now handle status-messages of practically any size, except that the interface between the main daemon and the worker modules (handling history, RRD files and alerts) is limited to 100 KB message size. Configuration files (bb-hosts, hobbit-alerts.cfg, hobbitserver.cfg, hobbitlaunch.cfg) can have lines of any length. Continuation lines are now supported in all configuration files. * The moverrd.sh is now included in the default installation. * OpenBSD vmstat output now supported. LARRD / Hobbit cleanup: Upon request from Craig Cook, the code and docs were changed to clarify that Hobbit and LARRD are not related. I therefore decided to remove references to "LARRD" in the configuration files, resulting in these changes: * LARRDCOLUMN renamed to TRENDSCOLUMN, and LARRDS renamed to TEST2RRD in hobbitserver.cfg (handled automatically by "make install"). * The bb-hosts "LARRD:" tag was renamed to "TRENDS:". Existing bb-hosts files using the old tag still work, though. * The hobbitd_larrd program were renamed to hobbitd_rrd. The default hobbitlaunch.cfg file was also changed to reflect this, and the names of the logfiles from the two RRD update tasks were changed as well. All of this should happen automatically when running "make install", but if you have added extra options - e.g. for custom graphs - then you may need to re-do those modifications in hobbitlaunch.cfg. Changes from 4.0.3 -> 4.0.4 (29 May 2005) ----------------------------------------- Bugfixes: * "nodisp" tag re-implemented for hosts that should not appear on the Hobbit webpages. * Enabling the "apache" data collection could crash bbtest-net if the /server-status page returned was larger than 32 KB. * The "bbcmd" tool would not pass a --debug option to the command it was running. * Nested macros in hobbit-alerts.cfg were not working. * Using TAB's in hobbit-alerts.cfg could confuse the alert module. * hobbitd_alert's --test option now determines the page-name for a host automatically. It will also now accept a time-parameter to simulate how alerts are processed at specific time-of-day. * Status messages from "dialup" hosts should not go purple. * The "mailq" RRD handler would pick up the first number from the "requests" line, which might not be the right number. * Scheduling a "disable" did not work. * The startup-script was modified to correctly handle stale PID files left over from an unclean shutdown of Hobbit. These are now removed, and startup will proceed normally. Improvements: * CGI tools now log error-output to dedicated logs in /var/log/hobbit/ * The "bea-snmpstats.sh" script has been removed, and replaced with an enhanced tool "beastat". This collects statistics via SNMP from BEA Weblogic servers, and reports these via "data" messages to Hobbit. The data collected are run-time data for the JRockIT JVM and thread/memory utilization data from each Weblogic server instance. * The "ntpstat" RRD handler now accepts the raw output from "ntpq -c rv" * The heartbeat-timeout for hobbitd has been increased to 60 seconds. Changes from 4.0.2 -> 4.0.3 (22 May 2005) ----------------------------------------- Bugfixes (general): * The bb-datepage and bbmessage CGI tools were reading the POST data in an incorrect way; it worked on most systems, but did not adhere to the CGI specification. Bugfixes (alerting): * Acknowledgments were broken in 4.0.2. * Using PAGE=... in hobbit-alerts.cfg to pick out hosts on the front-page was not possible. The front page is now recognized with the name "/", so PAGE=/ will find them. * The hobbitd_alert --test option would not pick up the correct page-path settings from the bb-hosts file. * The BBALPHAMSG text passed to scripts as a default alert message now includes the URL link to the statuslog. Bugfixes (graphs and trends): * hobbitgraph.cgi could show "@RRDPARAM@" on graphs if it was matched by a NULL string (would happen for mailq graphs). * RRD files were not being updated while a status was blue (disabled), even though status messages were received. Changed so that blue logs are passed off to the RRD * The "disk1" graph definition failed to take into account that the numbers logged were already in KB of data. So the axis-label was wrong. parser - we might as well track data when it's there. * The "mailq" RRD handler now finds the queue-length regardless of whether it is before or after the "requests" keyword in the "mailq" or "nmailq" status message. * If the "apache" data collection was specified for a host, but the host had no "http" tests, a bogus http status report was being generated. * Network statistics used a conversion that would overflow on 32-bit systems. Bugfixes (web pages): * Some header/footer files for the snapshot report were missing. * The "info_header" and "info_footer" files were not being used for the "info" column pages. Bugfixes (installation): * Fix file-descriptor leak in the "setup-newfiles" tool used during installation. This could cause "make install" to abort while installing new files, on systems with a low setting for the max. number of simultaneous open files. * Some systems keep libraries in a /lib64 directory - this was not being searched by the configure script. * If Hobbit was built without SSL support and an SSL network test is configured, it will now always fail with a meaningful error message. * If the configure script found SSL- or LDAP-libraries, but these could not link, the build would not disable SSL- and LDAP-support and therefore it failed. If the libraries do not work, disable the support. * The SSL test would link without required network support libraries on some platforms. * The options for overriding SSL- and LDAP-include files did not match the documented name. * AIX systems built with gcc were missing the OSDEF compile- flags. Improvements: * All remnants of Big Brother compatibility have been removed. If you want to stick with the old Big Brother tool, use bbgen. This allowed for some much needed cleaning up of the bbgen code that loaded the status data, especially for the handling of purple status logs. Also, the sending of messages has dropped all support for the Big Brother method of sending "page" messages when a status-message is sent with an alert-color, so the BBPAGE and BBPAGERS settings are no longer used. * The maint.pl script has been removed. A new tool, hobbit-enadis.cgi, replaces this with a native Hobbit tool. If you are upgrading, you should change the ~/server/www/menu/menu_items.js file to point link to "hobbit-enadis.sh" instead of "maint.pl". * The "info" column page now includes a form to disable and enable tests for a single host. If you prefer not to have this on the info page, add the option "--no-disable" to the hobbit-cgi/bb-hostsvc.sh wrapper. * Some logfiles have been moved to the normal log-file directory - /var/log/hobbit: - nkstatus.log (from ~hobbit/server/) - notifications.log (from ~hobbit/data/acks/) - acklog (from ~hobbit/data/acks), also renamed to acknowledge.log * The hobbit-mailack CGI will now find a "delay=DURATION" line in the mail message, and use that as the duration of the acknowledgment. Reports show that there are some types of mail/SMS systems where you cannot modify the message subject. * Paul D. Backer contributed "favicon" images generated from the Hobbit "recent" GIF files. These have been added and are now loaded from the Hobbit web/*_header files, so that browsers supporting this (Mozilla, Firefox) will display a favicon-image in the titlebar or on the page-tab holding the Hobbit webpage. * Two new settings in hobbitserver.cfg can be used to restrict which filesystems are being tracked by Hobbit. The NORRDDISKS / RRDDISKS settings are regular-expressions that the filesystem names are matched against. See the hobbitserver.cfg(5) man-page. * Hobbit now works with RRDtool 1.2.x. Due to a bug in the first 1.2.x releases of RRDtool, you must use at least version 1.2.2 with Hobbit. * A new report-output option lets you save a CSV (comma- separated values) file with the availability data for each host+test in Hobbit. This allows for further processing of the availability data eg. importing it into a spreadsheet. This can be selected from the report request webpage. * It is now possible to define different values for an environment setting in hobbitserver.cfg, depending on a new "--area=..." command-line option for all tools. See hobbitserver.cfg. * Apache 1.x performance data are now graphed correctly, by mapping the "BusyServers/IdleServers" data into the BusyWorkers/IdleWorkers datasets used by Apache 2.0 RRD graphs. * sendmail RRD's for sendmail 8.13+ now tracks the number of "quarantined" messages also. You must delete any existing sendmail.rrd files for this new data item to be tracked. * All RRD-updates now use the RRDtool "--template" option to map values to RRD datasets. This should keep us from updating datasets with wrong values. * Graphs can now get their title from a script, instead of using a static string in hobbitgraph.cfg. This lets you e.g. pick up the graphtitle from the mrtg.cfg. * All CGI's now pick up commandline options from a new configuration file, hobbitcgi.cfg. This is to ease packaging and make sure the CGI's can be updated without losing local configurations. * The debian- and rpm-specific packaging files are now included in the distribution tarfile. * A historical status-log now includes a "Full History" button to go to the history overview. * "irix" is now recognized as operating system. A preliminary vmstat RRD will be generated, but this may change in a future version. "netstat" reports are not yet handled. Changes from 4.0 -> 4.0.2 (10 Apr 2005) --------------------------------------- Bugfixes: * "meta" reports could crash hobbitd. * Parsing of HTTP responses could crash bbtest-net. * Eventlog entries from hosts not in bb-hosts would crash the eventlog CGI reporting tool. * bbgen would crash when FQDN was set to FALSE (not default) and a host was in bb-hosts with the fully-qualified name. * The external rrd-module might launch more than one script simultaneously, which resulted in the status message being lost. * AS/400 disk reports were not handled correctly by the RRD module, since the format was different from what was expected. * bbtest-net would do DNS lookups of hostnames when run with the "--dns=ip" option, causing a severe slowdown. * When viewing historical disk reports from Unix systems, the "Status unchanged in ..." message might be included in the last line of the disk status report, instead of being on a line by itself at the bottom of the page. * "badconn" tags were not being recognized. Eric Schwimmer found the problem and even provided a patch. * On Solaris, the HOME environment variable may not be defined if Hobbit is started from a bootup-script. This caused network tests to stop running. * Historical logs that are saved when a host is disabled or goes purple would not reflect the blue/purple color, but the color of the status before it was disabled or went purple. * A spelling error in the hobbitgraph.cfg file caused the hobbitgraph CGI to crash when trying to generate mailq graphs. * Some BSD systems do not have an atoll() routine, which broke compilation of the hobbitd_larrd module. Switched to use Hobbit's own atoll() for this platform. * Duration-times could be mis-reported on the "info" page, due to some bad math done when building this page. Improvements: * In bb-hosts you can now define a host called ".default." - the tags set for this host will be the default tag settings for the following hosts (until a new .default. host appears). * Merged the bb-infocolumn and bb-larrdcolumn functionality into the service-display CGI, and make bb-hostsvc into a Hobbit-only CGI called hobbitsvc.cgi. The bb-infocolumn and bb-larrdcolumn binaries have been removed. * --alertcolors, --okcolors and --repeat options for hobbitd, hobbitd_alert and bb-infocolumn were moved to settings in hobbitserver.cfg (ALERTCOLORS, OKCOLORS and ALERTREPEAT respectively). * A new generic RRD handler was added for name-colon-value reports. This can be used to handle RRD tracking of reports where the data is in lines of the form "Name: Value". * New "notrends" tag for bb-hosts entries drops the "trends" tag for a host (similar to the "noinfo" tag). * New "DOC:" tag for bb-hosts entries lets you set a documentation URL for each host, or as a default using the ".default." hostname. The --docurl option on bb-infocolumn (which no longer exists) has been dropped. * Included the "hobbitreports.sh" script to show how reports can be pre-generated by a cron-job. * New "bb-datepage.cgi" CGI makes it easy to select daily/weekly/monthly pre-generated reports. * The hobbitgraph CGI now allows you to show weekday- and month-names in your local language, instead of forcing it to english. Note that the fonts used may not include all non-ASCII characters. * CPU-reports from the bb-xsnmp.pl script should now be handled correctly by the RRD module. * CPU-reports for z/VM should now be handled correctly by the RRD module. * CPU-, disk- and memory-reports from the nwstats2bb script are now handled by the RRD module. Changes from 4.0 -> 4.0.1 (31 Mar 2005) --------------------------------------- Bugfixes: * Compiling Hobbit on some platforms would fail due to a couple of missing standard include files not being pulled in. Changes from RC-6 -> 4.0 (30 Mar 2005) -------------------------------------- Bugfixes: * DOWNTIME handling was broken in some cases. While fixing this, it was also enhanced to allow multiple day-specifications in a single time-specification, e.g. "60:2200:2330" is now valid way of defining downtime between 10 PM and 11:30 PM on Saturday and Sunday. * If bbtest-net was run with "--dns=ip" option to disable DNS lookups, a DHCP host could not be tested. Changed so that hosts with a "0.0.0.0" IP-address always do a DNS lookup to determine the IP-address for network tests. * Links in the RSS file were missing a trailing slash. * Disk- and CPU-reports from AS/400 based systems were not handled properly by the RRD parser. Note that this change means the PCRE library is now needed for the hobbitd_larrd module. * The default hobbitlaunch.cfg had the environment path for the bbcombotest task hard-coded, instead of picking up the configuration value. * Back out the cookie-based filtering of hosts in the Enable/Disable (maint.pl) script - it breaks in too many places. Need further investigation. * Alert rules with the STOP and UNMATCHED keywords now flag this on the info-page alert table. * If a host was removed from the bb-hosts file, and hobbitd reloaded the file before a "drop" command was sent, then it was impossible to get rid of the internal state stored for the host without restarting Hobbit. * Disabled status-logs could go purple while still being disabled, this would show up as the status alternating between blue and purple. Improvements: * If invoking fping failed, the error message was lost and only a "failed with exit code 99" error was reported. Changed so that the real cause of the error is reported in the bbtest-net log. * An "mrtg" definition was added to hobbitgraph.cfg, to handle MRTG generated RRD files. If MRTG is configured to save RRD files directly to the Hobbit rrd-directory, these graphs can be handled as if they were native Hobbit graphs. Wrote up hobbit-mrtg.html to describe this setup. * When hosts are removed from the bb-hosts file, all in-memory data stored about the host is dropped automatically, so e.g. alerts will no longer be sent out. Data stored on disk is unaffected; this only gets removed when a "drop" command is issued. * Time-specifications now accept multiple week-days, e.g. you can define a host that is down Sunday and Monday 20:00->23:00 with "DOWNTIME=01:2000:2300". This applies globally, so also applies to alert-specifications and other time-related settings using the NKTIME syntax. It will also complain rather loudly in the logfile is an invalid time-specification is found in one of the configuration files. * hobbit-mailack added to the HTML man-page index and to the overview description in hobbit(7). * A new "trimhistory" tool was added, allowing you to trim the size of history logs and historical status-log collections. Changes from RC-5 -> RC-6 ------------------------- Bugfixes: * Recovery messages were sent to all recipients, regardless of any color-restrictions on the alerts they received. Changed this so that recipients only get recovery messages for the alerts they received. * The "NOALERT" option was not applied when multiple recipients were listed in one rule. * bbtest-net now performs a syntax check on all URL's before adding them to the test queue. This should stop it from crashing in case you happen to enter a syntactically invalid URL in your bb-hosts file. * The acknowledgment log on the BB2 page could mix up data from different entries in the log. * The default mail-utility used to send out e-mail alerts is now defined per OS. Solaris and HP-UX use "mailx", others use "mail". * Client tests no longer go purple when a host has been disabled. * bb-larrdcolumn no longer dumps core if there are no RRD files. * With the right input, bb-larrdcolumn could use massive amounts of memory and eventually terminate with an out-of-memory error. * A memory leak in hobbitd_larrd handling of "disk" reports was fixed. * bb-infocolumn now accepts a "--repeat=N" setting to inform it of the default alert-repeat interval. If you use --repeat with hobbitd_alert, you should copy that option to bb-infocolumn to make it generate correct info-column pages. * If bbgen cannot create output files or directories, the underlying error is now reported in the error message. * The "merge-lines" and "merge-sects" tools used during installation could crash due to a missing initialization of a pointer. Improvements: * It is now possible to make Hobbit re-open all logfiles, e.g. after a log rotate. Use "server/hobbit.sh rotate". * The hobbit-mailack tool now recognizes the BB format of alert message responses, i.e. putting "delay" and "msg" in the subject line will work. * bbcmd defaults to running /bin/sh if no command is given * hobbitd_larrd now logs the sender IP of a message that results in an error. * A network test definition for SpamAssassin's spamd daemon was added. * The default web/*header files now refer to a HOBBITLOGO setting for the HTML used in the upper-left corner of all pages. The default is just the text "Hobbit", but you can easily replace this with e.g. a company logo by changing this setting in hobbitserver.cfg. * The Hobbit daemon's "hobbitdboard", "hobbitdxboard" and "hobbitdlist" commands now support a set of primitive filtering techniques to limit the number of hosts returned. * maint.pl uses the new Hobbit daemon filtering and a cookie defined by the header in webpages to show only the hosts found on the page where it was called from, or just a single host. * Hobbit should now compile on Mac OS X (Darwin). * The info- and graph-column names are now defined globally as environment variables "INFOCOLUMN" and "LARRDCOLUMN", respectively. This eliminates the need to have them listed as options for multiple commands. Consequently, the --larrd and --info options have been dropped. * Systems with the necessary libraries (RRDtool, PCRE, OpenSSL etc) in unusual locations can now specify the location of these as parameters to the configure script, overriding the auto-detect routine. See "./configure --help" for details. * A definition for the "disk1" graph in LARRD was added, this shows the actual use of filesystems instead of the normal percentage. Changes from RC-4 -> RC-5 ------------------------- Bugfixes: * Very large status- or data-messages could overflow the shared memory segment used for communication between hobbitd and the worker modules. This would cause hobbitd to crash. Such messages are now truncated before being passed off to worker modules. * hobbitd_alert no longer crashes when trying to send a message. * Recovery messages were sent, even when no alert message had been sent. This caused unexpected "green" messages. * The router dependency "route" tag handling was broken. Improvements: * The "starthobbit.sh" script now refuses to start Hobbit if it is already running. * The "starthobbit.sh" script was renamed to just "hobbit.sh". It now also supports a "reload" command that causes hobbitd to reload the bb-hosts file. * The bb-hist CGI now supports the NAME tag for naming hosts. * The history CGI's showed the Host- and service-names twice when in Hobbit mode. Once is enough. * A "NOALERT" setting in hobbit-alerts.cfg was implemented, so it is easier to define recipients who only get notice-messages and not alerts. * The input parameter check for CGI scripts was relaxed, so that special characters are permitted - e.g. when passing a custom hostname to a CGI. Since we do not use any shell scripts in CGI handling, this should not cause any security problem. Building Hobbit: * The /opt/sfw directory is now searched for libraries (Solaris). * The order of libssl and libcrypto has been swapped, to avoid linker problems on some platforms. Changes from RC-3 -> RC-4 ------------------------- Bugfixes: * Loading the bb-services file no longer causes bbtest-net, hobbitd_larrd et al to crash. * The alert configuration loader was fixed, so that recipient criteria are applied correctly. * hobbitd_alert handling of "recovered" status messages was slightly broken. This was probably the cause of the unexpected "green" alerts that some have reported. * SCRIPT recipients can now have a "@" in their names without being silently treated as MAIL recipients. * An acknowledge message is now cleared when the status changes to an OK color (defined by the --okcolors option). Previously it would have to go "green" before the ack was cleared. * Acked and disabled statuses can not go purple while the acknowledge or disable is valid. This was causing brief flickers of purple for tests that were disabled for more than 30 minutes. * maint.pl "combo" message support was dropped. This was causing runtime warnings, and it has never been possible to send enable or disable messages inside combo's (neither with Hobbit nor BB). Building Hobbit: * bb-infocolumn should build without problems again. * The "configure" script now also looks in /opt/csw for tools and libraries (for Solaris/x86) * An OpenBSD Makefile was contributed. * The gcc option "-Wl,--rpath" is now used when linking the tools on Linux and *BSD. This should resolve a lot of the issues with runtime libraries that cannot be found. * "configure" now looks for your perl utility, and adjusts the maint.pl script accordingly. * HP-UX does not have an atoll() routine. So a simple implementation of this routine was added. Configuration file changes: * hobbitlaunch.cfg now supports a DISABLED keyword to shut off unwanted tasks. This permits upgrading without having to re-disable stuff. * All commands in hobbitserver.cfg are now quoted, so it can be sourced by the CGI scripts without causing errors. Note that this is NOT automatically changed in an existing configuration file. Improvements: * The detailed status display now lets you define what graphs should be split onto multiple graph images (the "--multigraphs" option for bb-hostsvc.cgi and hobbitd_filestore). Currently the "disk", "inode" and "qtree" graphs are handled this way. * The detailed status display now includes a line showing how long an acknowledgment is valid. This is configurable via the ACKUNTILMSG setting in hobbitserver.cfg. * A new "notify" message is supported as part of the Hobbit protocol. This takes a normal hostname+testname paramater, plus a text message; this is sent out as an informational message using the hobbit-alerts.cfg rules to determine recipients. This replaces the BB "notify-admin" recipient with a more fine-grained and configurable system. Currently used by maint.pl when enabling and disabling tests. * Alert scripts now receive a CFID environment variable with the linenumber of the hobbit-alerts.cfg file that caused this alert to go out. * A new tool - hobbit-mailack - was added. If setup to run from e.g. the Hobbit users' .procmailrc file, you can acknowledge alerts by responding to an alert email. * Temperature reports now accept additional text in parenthesis without being confused. Changes from RC-2 -> RC-3 ------------------------- Configuration file changes: * The bb-services file format was changed slightly. Instead of "service foo" to define a service, it is now "[foo]". Existing files will be converted automatically by "make install". * The name of the "conn" column (for ping-tests) is used throughout Hobbit, and had to be set in multiple locations. Changed all of them to use the setting from the PINGCOLUMN environment variable, and added this to hobbitserver.cfg. * The --purple-conn option was dropped from hobbitd. It should be removed from hobbitlaunch.cfg. * The --ping=COLUMNNAME option for bbtest-net should not be used any more. "--ping" enables the ping tests, the name of the column is taken from the PINGCOLUMN variable. * The GRAPHS setting in hobbitserver.cfg no longer needs to have the simple TCP tests defined. These are automatically picked up from the bb-services file. Bugfixes: * hobbitd no longer crashes, if the MACHINE name from hobbitserver.cfg is not listed in bb-hosts. Thanks to Stephen Beaudry for helping me track down this bug. * If hobbitd crashed, then hobbitlaunch would attempt to restart it immediately. Added a 5 second delay, so that there's time for the OS to clean up any open sockets, files etc that might prevent a restart from working. * The "disk" RRD handler could be confused by reports from a Unix server, and mistake it for a report from a Windows server. This caused the report to try and store data in an RRD file with an invalid filename, so no graph-data was being stored. * The "cpu" and "disk" RRD handlers were enhanced to support reports from the "filerstats2bb" script for monitoring NetApp systems. The disk-handler also supports the "inode" and "qtree" reports from the same script. * bb-services was overwritten by a "make install". This wiped out custom network test definitions. * bbnet would crash if you happened to define a "http" or "https" test instead of using a full URL. * bbnet was mis-calculating the size of the URL used for th apache-test. This could cause it to overflow a buffer and crash. * hobbitd would ignore the BBPORT setting and always default to using port 1984. * Portability problems on HP-UX 11 should be resolved. From reports it appears that building RRDtool on HP-UX 11 is somewhat of a challenge; however, the core library is all that Hobbit needs, so build-problems with the Perl modules can be ignored as far as Hobbit is concerned. * hobbitd_alert could not handle multiple recipients for scripts, and mistakenly assumed all recipients with a "@" were for e-mail recipients. * Alert messages no longer include the "