- To aid the speed at which a full trawling cycle takes and to reduce database space requirements, it is possible to set a maximum catalogue depth. Although all files are still touched during trawling, summarised details are only stored at the level specified. For example, say user file system data is located under d:\share\data\users and each username appears as a subfolder from here. Setting the catalogue depth to 4 would list out all username folders and all subfolder data would be summarised at this level, however no further subfolders would be displayed in the console unless explicitly defined to do so.
- Certain folders may need to be excluded from the scan. C:\windows or ~snapshot are system folders and these can be added as exclusions.
Once the data has been collected, the remote agents continue to update the summarised information on an on-going basis. Detail reports can then be generated that can be used for capacity planning or billing purposes without having to wait for scanning to complete. Automatic scheduled generation of the reports as well as an email notifications when each report is ready can even be configured.
File System
At least one remote agent is configured to perform Data Volume Discovery and once in schedule trawls the file system, summarising data such as file types, folder permissions, item owners, size and count. The Offline Attribute is checked and if the local file is a stub, then count and size details are recorded separately.
For maximum throughput, remote agents should be configured to trawl local hard disks and where multiple file servers exist, then one agent would ideally be installed onto each server. Should this is not possible (when trawling a NAS device for example) the remote agent can be pointed at UNC path in order to trawl across the network, however the physical connection should be as short as possible so as to minimise any bottlenecks created by the network.
Public Folders
A single remote agent is given the task of creating a catalogue of the public folder estate – that is a list of all folders that are to be trawled. Once complete, one or more remote agents then start to trawl the public folders, touching each message and summarising data. Details collected include item size and count, item owners (based upon sender and receiver counts), folder permissions, item modified dates and attachment file types. The IPM message type is also interrogated and the original size and count recorded separately if found to be a shortcut. Using the ownership information, the folder can then be assigned an owner and the data moved or copied into a PST file for onward transmission to an archive repository.