
This document seeks to explain how to install ruminate.

Before seeking to follow this installation guide, it is recommended that one reads the documentation and technical reports about ruminate on the website: ruminate-ids.org.


Hardware
--------

The reference architecture for involves a "master" server connected to a passive network tap from which traffic to be analyzed is captured. The master is also connected through a separate private network or cloud to additional analyzer nodes.

Operating System
----------------

Rumiante has been used on Linux. It should be portable to other Unix like operating systems but this has not been tested.

Prerequisite Software
---------------------

In addition to the typical system utilites found on Linux, Ruminate requires the following software:

Apache with PHP
clamav
jsunpack-n (installed at /shared/bin/jsunpack-n) and dependencies
xorsearch
pdftk
perl and various modules such as HTTP::Parser, MIME::Tools, etc
python
vortex ids

many of these prerequsites are expected in /shared/bin


Stream capture, reassembly, and distribution code
------------------------------------------------

The steam capture and reassembly code is based largely on vortex ids. It is recommended you build, install, and read the documentation for vortex ids before proceding to install ruminate.

rumiante_server performs packet capture, filtering, and stream reassembly. Reassembled streams are provided to the next available ruminate_client for analysis. The protocol over which TCP stream data is transfered from ruminate_server to ruminate_client is documented in ruminate_client code.

Compile ruminate_server and ruminate_client as found in the 'src' dir (compile instructions are near top of source files). The ruminate_server binary is required on the master or capture server. The ruminate_client binary is required on the analysis nodes.

Stream Processing and Object extraction code
--------------------------------------------

TCP stream parsing, including layer 7 protocol analysis (ex. HTTP, SMTP) and client application object extraction occurs on the analyzer nodes.

All stream procesing and object extraction code is written in scripting languages such as bash or perl. No compilation should be necessary, except for any requisite perl modules.

Parsing code is found in the 'bin' dir and includes http_parser, smtp_parser, mime_parser. It should be installed at /shared/bin/. Additionally, object routing scripts inlcuding object_mux and *-helper should also be installed at /shared/bin.

The object_mux script performs magic number inspection of objects transfered through the network and routes objects to appropriate object analysis services based on policy using the *-helper scripts.

Object analysis (web) services
------------------------------

Client application objects are analyzed by object analysis services. Currently, all object analysis services are implemented as simple web (HTTP) services. Objects are POSTed to the web service using *-helper scripts.

In the reference implementation, all object analysis services are implemented on the "master" server. This could easily be modified. Load balancing could be performed using standard web load balancing technologies.

The code for the object analysis services is found in the 'www' dir. They should be installed in the web server(s) under the "ruminate" directory. Ex. PDFs will be POSTed to http://master/ruminate/pdf.php.

The object analysis services require use of a large number of the prerequisites. These prerequisites need only be installed on the server(s) which use them.

PHP configuration may need to be modified to support large file uploads, long running scripts, etc.

File Systems and Permissions
----------------------------

Ruminate can make use of network file systems to store data in a central location, but that is not necessary. The archive object service needs to be able to write to /shared/data/archive. The PDF service needs to be able to write to /shared/data/pdf_metadata.

Most components expect existance of tempfs at /dev/shm as is common on most modern linux distros. Many components also expect /dev/shm/payloads to exist. This directory needs to be created such that the protocol parsers have access to write to this directory.

It is highly recommned configuring PHP so that temporary files are written to tempfs. Ex. set "upload_tmp_dir" to "/dev/shm" in php.ini.

Logging/Alerting
----------------

Ruminate provides for extensive collection of metadata. It is assumed that in a production environment, this metadata would be forwarded to an external event auditing/log correlation system. Ruminate is in the business of network payload object analysis, incuding exposed embedded object metadata. Ruminate is not in the business of advancing event auditing and log correlation systems.

As is, ruminate uses syslog for logging. Facilites local0 through local6 are used. The rough usage of these facilites can be deduced from the following configuration:

local0.*                                                -/var/log/ruminate/stats.log
local1.*                                                -/var/log/ruminate/http.log
local2.*                                                -/var/log/ruminate/smtp.log
local3.*                                                -/var/log/ruminate/object.log
local4.*                                                -/var/log/ruminate/pdf.log
local5.*                                                -/var/log/ruminate/zip.log
local6.*                                                -/var/log/ruminate/misc.log

If the above configuration is installed on the master server, the master node is configured to accept syslogs from the analyzer nodes, and the following is installed on the analyer nodes, all logs will get centralized to the master node:

local0.*;local1.*;local2.*;local3.*;local4.*;local5.*;local6.*  @master


Note that the metadata rumiante currently collects from protocol parsing should be customized based on the needs of the organization. For example, the HTTP parser currently extracts metadata useful for studying object transfered through the network (ex. content-type header) but excludes metadata useful for security in operational environments (ex. referer header). The SMTP parser intentionally neglects collection of critical metadata to protect the privacy users in an academic environment.


Starting Ruminate
-----------------

To start Ruminate, ensure all the scanning services are in place. The various helper scripts can be used to test the scanning services.

Next, the protocol parsers should be started. They can be started as follows (on each analysis node):

mkdir /dev/shm/payloads
while [ 1 ]; do ruminate_client -h master -p 6000 -t /dev/shm -b 1 -l -d -e 1 -c 10000 | http_parser | object_mux; done &
while [ 1 ]; do ruminate_client -h master -p 6000 -t /dev/shm -b 1 -l -d -e 1 -c 10000 | http_parser | object_mux; done &
while [ 1 ]; do ruminate_client -h master -p 6000 -t /dev/shm -b 1 -l -d -e 1 -c 10000 | http_parser | object_mux; done &
while [ 1 ]; do ruminate_client -h master -p 6000 -t /dev/shm -b 1 -l -d -e 1 -c 10000 | http_parser | object_mux; done &
ruminate_client -h master -p 6001 -t /dev/shm -b 1 -l -e 1 | smtp_parser /dev/shm/payloads | mime_parser /dev/shm/payloads | object_mux &

Note that the client connect to he master server to request streams to process on port 6000 for HTTP and 6001 for SMTP. Four HTTP parsers and one SMTP parser are started per analysis node. Also note that the while loop is used as a workaround for an apparant memory leak in the HTTP parser as well as to demonstrate that parsers can be started and stopped dynamically. This demonstrates the dynamic nature the stream load balancing that occurs in ruminate.

Last, the packet capture/replay and stream reaseembly must be started. On the master node, execute some like:

sudo ruminate_server -i eth1 -s 4000000 -Q 100000 -S 2000000000 -C 2000000000 -K 600 -e -V 6000 -u RUMINATE_USER -M 1600 -E 600 -T 600 -L HTTP -Q 100000 -f "tcp port 80" &
sudo ruminate_server -i eth1 -s 1000000 -Q 100000 -S 2000000000 -C 0 -K 600 -e -V 6001 -u RUMINATE_USER -M 1600 -E 600 -T 600 -L MAIL -Q 100000 -f "tcp port 25" &

Note that RUMINATE_USER needs to be replaced with the (non-priveledged) user under which ruminate runs. The majority of the options are inherited from vortex. See vortex documentation for tuning these parameters. Many sites will also want to modify the filter specified at the end of the command to match site policy/traffic.



















































































