Multi-Cue ISM Documentation =========================== General Information =================== This program, and all associated files or parts thereof, are made available exclusively for non-commercial research purposes. Any commercial use or resale of this software requires a license agreement with the author and the Computer Vision Group at RWTH Aachen. The code and binaries are under copyright protection. If you are interested in commercialization, please contact the author under the following email address: leibe@vision.rwth-aachen.de. Copyright Bastian Leibe, Computer Vision Group, RWTH Aachen, 2008-2012. Computer Vision Laboratory, ETH Zurich, 2006-2008. Multimodal Interactive Systems Group, TU Darmstadt, 2004-2005. Parts of the package may contain code that is copyrighted by other parties. In particular, the subdirectory "code" contains interest point detectors and region descriptors made available by Krystian Mikolajczyk (kma@robots.ox.ac.uk) for non-commercial research use. Intellectual property for those parts has to be respected, as well. Disclaimer ---------- THIS CODE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Use at your own risk. Further Information ------------------- An explanation of the employed algorithms can be found in the following papers: Bastian Leibe, Ales Leonardis and Bernt Schiele, Robust Object Detection with Interleaved Categorization and Segmentation In International Journal of Computer Vision, Vol.77, No. 1-3, May 2008. Bastian Leibe, Krystian Mikolajczyk, and Bernt Schiele, Segmentation-Based Multi-Cue Integration for Object Detection In British Machine Vision Conference (BMVC'06), 2006. Contents: ========= I. Quick-Start Instructions I.1 Using the Provided Detectors I.2 Important Detector Parameters I.3 Using a Ground Plane Calibration I.4 Caveats II. In-Detail Description of the GUI II.1 Description of the Main Window Interface Elements II.2 Description of the Detector Window GUI II.3 Description of the Feature Extraction Window GUI III. Description of the Command Line Interface IV. Description of the File Formats Used IV.1 IDL Format IV.2 Calibration File Format IV.3 Matlab-Readable Recognition Result Format IV.4 Matlab Workspace Format for Storing Result Segmentations I. Quick-Start Instructions =========================== I.1.) Using the Provided Detectors ---------------------------------- I.1.0.) After the program has been installed, it can be started by typing "./start.sh &" from the command line interface. I.1.1.) Before anything can be recognized, the system first needs to load a detector file. Several pre-trained detectors are already available with the installed code. Others can be downloaded from our webpage (http://www.vision.ee.ethz.ch/bleibe/code/). In order to load a detector, click on the button "Add Detector" on the left side of the program window. A new window with the title "Detector 1" will open up. Click on the "Load" button in the bottom row of this window and select a detector file for loading (detector files can be recognized by the suffix ".det"). Once the detector is fully loaded, it will be displayed in the detector table on the left side of the program window. Note that when loading a detector, all GUI parameters will be automatically set to the saved default values for this detector. If the imaging conditions of the test data vastly differ from those used during training, it may be beneficial to adapt some of those parameters. How that is done is explained in Section I.2 below. I.1.2.) Now the system is ready to recognize objects of the trained category. Click on "Process Test Image" to load an image and start the recognition process. The system will automatically compute interest points with the selected detector, extract local features from the detected regions, match them to the codebook, and generate probabilistic votes in a Hough voting space. The maxima in this space are then extracted using Mean-Shift-Mode-Estimation and form, if they surpass a certain threshold ("Thresh Single" in the detector parameters), the initial object hypotheses. If the MDL hypothesis verification stage is activated (check boxes "Do MDL Selection" and "Rej. Savings<" in the "Hypothesis Section"), the system then verifies the hypotheses and only accepts those that surpass a second threshold ("Rej. Savings<"). Details of the underlying algorithm can be found in the paper mentioned at the beginning of this document. The system draws a rectangle around every detection and opens a window to display the results in detail. This window shows one automatically computed top-down segmentation for each of the accepted hypotheses. Additional outputs can be displayed by selecting the corresponding visualization options in the "Output" and "Display" tabs (see Section II.1.7). I.2.) Important Detector Parameters ----------------------------------- When loading one of the provided detectors, all of its parameters will automatically be set to its default settings (optimized for our training/test data sets). When imaging conditions differ considerably from the training conditions, it may be beneficial to adapt some parameters. This section introduces the most important parameters and describes their effects. All other parameters can (and should) in general be left at their default settings. I.2.1) Final Hypothesis Verification Threshold The most important recognition parameter is the final hypothesis verification threshold in the field "Reject Savings<" on the right hand side of the main program window, This threshold defines the minimum MDL score for a valid hypothesis such that it is still accepted (see II.1.14.b for a detailed description). By lowering this score, more detection hypotheses will be returned. When applying the detectors on new test data, it is therefore often useful to lower this threshold in order to get a feeling for the approach's limits, i.e. to see which objects are still detected with a lower score and which ones are missed entirely. Note that when lowering this threshold, it is usually necessary to lower the Detector thresholds, too, in order for this change to have an effect! This is described in the following section. I.2.2) Individual Detector Thresholds (Voting, MDL) Before passing hypotheses to the final verification procedure, each detector on its own already makes a pre-selection in order to filter out low-scoring detections that will not survive the verification anyway. This is done in two steps. Both steps can be influenced by adapting the corresponding parameters in the Detector window. First, an initial recognition threshold ("Reco --> Thresh Single", see II.2.12) specifies which maxima in the Hough voting space are kept as initial hypotheses. For each such hypothesis, the program then computes a top-down segmentation and derives the hypothesis's MDL score (before interactions are taken into account). Secondly, this computed MDL score is checked against another threshold ("MDL --> Rej. Savings<", see II.2.13) in order to filter out too weak hypotheses. It is important to keep in mind that when the final verification threshold from I.2.1 is lowered, this second threshold also needs to be adapted. In addition, when imaging conditions differ considerably from the training conditions, the initial "Thresh Single" threshold can be relaxed to a lower setting in order to permit more initial hypotheses to survive until the MDL verification stage. However, a lower value of "Thresh Single" will increase computational load and thus slow down recognition. I.2.3) Area Factor Different object types may take up different image areas (e.g. a car's side view is much larger than its rear view). Since the final MDL verification integrates per-pixel likelihoods over the hypothesized object area, this may bias recognition scores in favor of larger categories. In addition, the absolute range of a detector's scores may also depend on the size and quality of its training set. When working with several different detectors, both of those influences need to be balanced out. This can be done by specifying a different "Area Factor" for each detector (in the Detector window, see II.2.3), which is taken as a normalization constant during the hypothesis combination procedure. Each detector's recognition scores will be divided by this area factor. I.2.4) Detector Scale Range The ISM detector automatically performs multi-scale image analysis. That is, it tries to detect objects of the target category regardless of their size in the image. However, it can intuitively be seen that this search over scales becomes more expensive the larger the scale search range is. The detection procedure can therefore be sped up if this range can be restricted by application-specific information. This can be done by adapting the fields "Reco --> min Scale" and "Reco --> max Scale" in the Detector window (see II.2.12). Those fields determine the search scale range for recognition (relative to the training scale 1.0). The default range is [0.3,1.5]. If larger objects shall be recognized, the "max Scale" value needs to be increased. When doing that, the following two points have to be kept in mind. - The scale search range needs to be slightly larger than the effective object range, so that the corresponding object location and scale can be identified as a local maximum in Hough space. I would therefore recommend keeping a safety margin of about 0.5 for the "max Scale" value. - When the upper scale limit exceeds 2.0, the interest point scale range should also be adapted, as explained in the following Section I.2.5). I.2.5) Feature Extractor Scale Range For similar reasons as detailed in I.2.4, the employed interest point detectors are also applied with a limited scale search range. When trying to recognize objects at very large scales (e.g. in high-resolution images), those interest point detectors need to be adapted to extract features at larger scales. This can be done using the "Scale --> min Scale" and "Scale --> max Scale" fields in the Feature window (see II.3 and II.3.4 for details). The default setting for those values is [1.0,32.0] (allowing recognition under scale changes of up to a factor of 2, relative to the training scale). If larger scale changes are to be tolerated during recognition, the "Max Scale" value needs to be increased accordingly (to 48.0 or 60.0 for scale factors up to 4 or 5, respectively). I.3.) Using a Ground Plane Calibration -------------------------------------- In general, detection performance can be improved considerably if scene geometry information in the form of a ground plane calibration is available. In order for the detector to use this information, three conditions have to be fulfilled. First, the ground plane information has to be available in a text file according to the specifications described in IV.2. Second, the detector options need to be set such that the detector makes use of the available 3D information (see II.2.5, II.2.7). Finally, the main GUI options should be set such that the calibration scale is correctly converted to meters (see II.1.15). We strongly advise using ground plane information wherever it is available, as this can really bring a significant performance increase both in detection accuracy and in run-time efficiency. I.4.) Caveats ------------- When experimenting with the provided detectors, it is important to keep in mind the approach's limitations and adjust one's expectations accordingly. - The ISM approach has been designed with the goal to detect and localize novel instances of a given visual category that are seen from the same viewpoint or aspect. This means that a detector trained on side views of cars will typically not be able to recognize rear or front views of cars. Thus, for many real-world categories, it will be necessary to combine several different detectors. On the ISM webpage, we provide example detectors for different viewpoints of cars (http://www.vision.ee.ethz.ch/bleibe/ism). For some other categories, different viewpoints may be sufficiently similar that the same detector reacts to all of them. For example, the provided pedestrian detector was trained mainly on side views, but it will often react also to front or rear views although typically with a lower confidence. - In addition, the provided detectors are sensitive to contrast. They typically work better in images where contrast is good. This is mainly a limitation of the underlying local feature extractors. If the input images are low in contrast, those detectors typically find fewer features, rendering object detection more difficult. Similarly, very strong contrast regions (e.g. due to bright lighting and hard shadows in the image) often yield a large number of local features, which might bias detection to create more hypotheses there. If the contrast settings of a test scenario are known, it is therefore advisable to either adjust the gamma factor of the input images or the detector parameters accordingly. - One major advantage of the ISM approach, compared to many monolithic detectors, is that it can recognize objects under significant partial occlusion. However, this also means that the approach may return false positives due to partial object structures. This may happen for example for pedestrian detection, where certain road markings contain similar shapes as a pedestrian's legs. We are currently working on an extension of the approach to reduce those effects. - In general, detection performance can be improved considerably if scene geometry information in the form of a ground plane calibration is available, as described in I.3. If such information can be obtained (even if it's not that accurate), we strongly advise using it in the detector. II. In-Detail Description of the GUI ==================================== II.1.) Description of the Main Window Interface Elements -------------------------------------------------------- Left-Hand Side: --------------- II.1.1.) Load Test Image loads a new image and displays it in the main window. II.1.2.) Better RGB->Gray Conversion This option is just provided for compatibility reasons. Some older experiments were performed using the suboptimal RGB->Gray conversion according to the formula I = (R+G+B)/3. The newer version uses the more exact conversion I = (0.3*R+0.59*G+0.11*B). In order to replicate those older experiments, the older formula can be selected by unchecking this checkbox. II.1.3.) Perform Gamma Normalization Dalal & Triggs reported a performance improvement in their detection system when performing a Gamma normalization on their test images prior to feature extraction (Dalal & Triggs, CVPR'05). This option allows to perform a similar gamma normalization, where each pixel grayvalue is replaced by its square root prior to feature extraction. In my experiments, this option did however not yield a consistent improvement. II.1.4.) Add Detector This button adds a new detector to the system. It opens a new "Detector" window, where the detector parameters can be set manually, or where a pre-defined detector can be loaded. See Section II.2 for details about the detector parameters. II.1.5.) Table of Detectors. This table contains an entry for each currently loaded detector. The table columns summarize the following information: - The detector's target category (e.g. "car", "motorbike"), - its target pose (e.g. "side", "rear"), - whether or not the detector is also applied to a mirrored version of the image (see II.2.8), - the detector's initial voting threshold (see II.2.12 below) - its assigned bounding box color. In general, several detectors can be loaded and executed in parallel. Each detector has his own parameter window, which can be opened by double-clicking the corresponding list entry (see II.2 for information about the detector GUI). The program is written with the goal to reuse as much existing information as possible. Thus, if two detectors are based on the same features, they will share the outputs of the feature detector. Also, if two detectors are based on the same codebook, they will share this codebook, so that the extracted features have to be matched to it only once. II.1.6.) Table of Cues. This table contains an entry for each currently loaded feature extractor. The table columns summarize the following information: - The employed interest point detector, - the feature descriptor, - whether or not the feature extractor is also applied to a mirrored version of the image, - the minimum feature scale, - the maximum feature scale. This table is updated automatically when new detectors are loaded. As stated above under II.1.5, the program tries to reuse as much information as possible. Thus, if two detectors are based on the same features, they will automatically share the underlying feature extractor. Each feature extractor also has its own parameter window, which can be opened by double-clicking on the corresponding table entry (see II.3 for information about the feature extractor GUI). II.1.7.) GUI Options The following options can be used to display additional outputs of the object detector. They do not affect the algorithm's results, but they may affect its runtime. II.1.7.a) Drawing tab The first 2 options specify what should be drawn as the support of a hypothesis (see II.1.7.c)). If "Draw Maps" is checked, the hypothesis segmentation will be shown. If "Draw Confidence" is checked, the p(figure) probability map is shown instead. If "Draw Tight BBoxes" is checked, the result image not only displays the regular bounding box for each hypothesis, but also draws the bounding box of the segmentation (which is sometimes more accurate). The last option "Eval. Unique Contrib." is only experimental and should be left unchanged. II.1.7.b) Output tab Those options determine what output should be displayed on the command line. The first four options specify different levels of detail for this output: just the algorithm's "Main Steps", additional "Details", the intermediate "Voting Results", and the final "MDL Results". The final option "Show Timings" displays detailed timing results for the individual steps of the algorihtm. II.1.7.c) Display tab Those options allow to display additional graphical output. The first option selects whether interest points should be drawn into the input image. The following three options open result windows displaying the Hough "Voting Space", each hypothesis's "Support" (see also II.1.7.a)), and the result "Segmentations". The last option additionally displays each accepted hypothesis's result segmentation in the bottom part of the program window. WARNING: If the input images are very large (such as those from the MIT LabelMe Database), the additional outputs may take up a lot of space and may eventually crash the computer when its memory limit is reached. II.1.8.) Save Images saves the currently displayed images under a given file name. II.1.9.) Save Segmentations saves the segmentation for the current result image in 3 separate files: one for the p(figure) probability map, one for the p(ground) probability map, and one for the final segmentation. Only one file name needs to be specified -- the others are generated automatically. Right-Hand Side: ---------------- II.1.10.) Process Test Image start the recognition process. The system asks for a test image and applies each of the loaded detectors in sequence. For each detector, it extracts and matches image features to the detector's codebook, then generates probabilistic votes for the position of the object center in a Hough voting space, extracts maxima from this space as initial hypotheses, and computes a top-down segmentation for each generated hypothesis. The resulting hypotheses are then pooled and combined in an MDL hypothesis verification procedure (see II.1.14). The basic idea behind this hypothesis combination and verification procedure is that each pixel can belong to at most one object. Thus, all hypothesized detections compete for pixels, which results in interaction costs. The algorithm tries to find the optimal combination of hypotheses, such that the total sum of their scores (their "savings" in the terminology of the algorithm) minus their interaction cost is maximized. The final acceptance decision is made based on the threshold in II.1.14. As objects occurring at different scales take up different portions of the image, an automatic scale normalization is performed as part of the algorithm. However, different object types may also take up different image areas (e.g. a car's side view is much larger than its rear view). In addition, the absolute range of a detector's scores may also depend on the size and quality of its training set. When working with several different detectors, both of those influences need to be balanced out by hand. This can be done by specifying a different "Area Factor" for each detector (see Section II.2.3), which is taken as a normalization constant during the hypothesis combination procedure. II.1.11.) Perform IDL Test starts a test series on a whole set of images, where the image set is specified by an annotation file in a special file format (suffix ".idl", details to this format are described in Section IV.1). The function by default generates one result file containing the detection bounding boxes for each image, together with the final hypothesis scores. Optionally, two additional kinds of output can be written to disk using the following checkboxes. II.1.12.) Write Result Images This option stores the result images with detection bounding boxes in png format. The program asks for a result directory in which to store those images. In addition, the program writes out a Matlab-readable text file containing more detailed information about each detection (the corresponding file format is described in Section III.2). II.1.13.) Write Segmentations This options writes out the result segmentations as a matlab workspace (suffix ".mat", see Section IV.4 for details) for each image. As also in II.1.12, the program additionally writes out a Matlab-readable text file containing more detailed information about each detection (the corresponding file format is described in Section IV.2). II.1.14.) Hypothesis Selection tab This tab contains options and parameters for the final MDL hypothesis verification. If only a single detector is used, this stage is identical to the one described in our IJCV paper. If multiple detectors are run in parallel, the same stage can also be used to combine their detection results. In that case, it however becomes important to balance out their outputs. This is necessary, as different object types may take up different image areas (e.g. a car's side view is much larger than its rear view). In addition, the absolute range of a detector's scores may also depend on the size and quality of its training set. When working with several different detectors, both of those influences need to be balanced out by specifying a different "area factor" for each detector (see Section II.2.3), which is taken as a normalization constant during the hypothesis combination procedure. II.1.14.a) Do MDL Selection This option determines whether the outputs of the individual detectors should be combined in the final MDL hypothesis selection stage, or if all candidate hypotheses should simply be displayed without MDL verification. This option is useful for visualizing the effect of the MDL stage. It can also be useful if the detector results shall be read in to Matlab and combined with additional information there (in that case, Option I.2.13 should be checked). II.1.14.b) Hypothesis Selection options determines which method for hypothesis verification is used. Two options are possible: the MDL criterion (check boxes "Do MDL Selection" and "Rej. Savings<" selected) and/or the Bounding Box criterion (check box "Rej. Overl>" selected). The text fields allow to specify the percentage of overlap for the bounding box criterion, and the minimum MDL score for a valid hypothesis, respectively. The parameter "K2/K0" determines how much the MDL criterion should trust the size of a segmentation as opposed to its supporting p(figure) score. It can be varied between about 0.90-0.95 (both count equally) and 1.0 (only the p(figure) score counts). II.1.15.) Ground Plane tab This tab contains two parameter fields that are used when working with a ground plane calibration. Note that the use of a ground plane requires a calibration file to be available (see IV.2 for details to the calibration file). In addition, the detectors need to be explicitly set to use the ground plane (see II.2.5, II.2.7). For many test datasets we used in our experiments, we rescaled all images to twice their original size for object detection. In those cases, the calibration files however still refer to the original image size. Therefore, all image coordinates need to be divided by a factor of 2 prior to applying the calibration. This can be accomodated for by setting the "Image Scale" parameter to 2. The second field "World Scale" is used to convert the calibrated world coordinates to meters. Depending on the test set, it may be necesary to adjust this parameter, so that the object size prior can be properly expressed during recognition. II.1.16.) Verification tab The options on this tab can be used to activate the Chamfer verification described in our CVPR'05 paper (if a set of silhouettes has been loaded, see II.1.17). However, the Chamfer verification code is not optimized and runs very slowly. In addition, advances on the feature detection side have in the meantime improved detector performance to a level such that the Chamfer verification brings no further advantage. This option therefore cannot be recommended and is only left for compatibility purposes. II.1.17.) Load Silhouettes Loads a set of silhouettes for the Chamfer verification stage. This functionality is left in the program for compatibility purposes, but should not be used in regular experiments. (The silhouettes needed for this are not included in the regular code distribution, but may be provided upon request). II.1.18.) Display Scale Footprint displays a histogram of the interest point scales detected in the current image. Useful for debugging purposes. II.1.19.) Quit terminates the program. II.2.) Description of the Detector Window Interface Elements ------------------------------------------------------------ This window contains the parameter settings for a single detector. It can be accessed by double-clicking on the corresponding entry in the Table of Detectors (see II.1.5). The optimal settings for the provided codebooks are set automatically upon loading the corresponding detectors. II.2.1.) Category specifies a name for the detector's target category (e.g. "cars", "cows,...) II.2.2.) Pose specifies a name for the detector's target pose (e.g. "side", "rear",...) II.2.3.) Area Factor Different object types may take up different image areas (e.g. a car's side view is much larger than its rear view). Since the final MDL verification integrates per-pixel likelihoods over the hypothesized object area, this may bias recognition scores in favor of larger categories. In addition, the absolute range of a detector's scores may also depend on the size and quality of its training set. When working with several different detectors, both of those influences need to be balanced out. This can be done by specifying a different "Area Factor" for each detector, which is taken as a normalization constant during the hypothesis combination procedure. Each detector's recognition scores will be divided by this area factor. II.2.4.) Size(m) specifies the target category's mean size (in meters). This parameter only has an effect when working with a ground plane calibration. II.2.5.) Size Variance specifies the target category's size variance (in m^2). The adjacent check box determines if this variance shall be used to weight the detection scores (or if just a hard ground plane corridor shall be used, see also II.2.7). This parameter only has an effect when working with a ground plane calibration and when the "Ground Plane Filter" option from II.2.7 is selected (see also II.1.15 and IV.2). II.2.6.) Dist. from Center specifies the distance from the object's bounding box footpoint to its 3D center point (in meters). This can be used in order to let several single-view detectors (e.g. "frontal car" and "semi-profile car") agree on a common object center. This parameter only has an effect when working with a ground plane calibration. II.2.7.) Use Ground Plane Filter specifies whether the ground plane shall be used in order to limit object detection to a corridor in the (x,y,scale) volume. This parameter only has an effect when a ground plane calibration is available. If such a calibration is available, the "Size Variance" from II.2.5 can additionally be used in order to weight the detection scores accordingly (see also II.1.15 and IV.2). II.2.8.) Mirror Image When this option is selected, the detector will additionally be applied to a mirrored version of the input image. This can be used for convenience, since the detectors then need to be trained only for a single direction (e.g. for cows walking left). II.2.9.) Add Cue This button can be used to build up a custom detector by adding another cue to it. Here, a "cue" means the combination of a pre-trained codebook and occurrence file based on the same basic features (i.e. the same interest point extractor and feature descriptor). There is no restriction to the number or type of cues to be added. However, in practice, a detector usually consists of between 1 and 3 cues. The way how cues are combined is described in our BMVC'06 paper. II.2.10.) Table of Cues This table contains an entry for each cue assigned to this detector. The table columns summarize the following information: - The employed interest point detector, - the feature descriptor, - whether or not the feature extractor is also applied to a mirrored version of the image, - the codebook size (number of stored cluster centers) - the number of stored occurrences. This table is updated automatically when new cues are added. As stated above under II.1.5, the program tries to reuse as much information as possible. Thus, if two detectors are based on the same features, they will automatically share the underlying feature extractor. Each feature extractor also has its own parameter window, which can be opened by double-clicking on the corresponding table entry (see II.3 for information about the feature extractor GUI). In addition, this table is linked to the Table of Cues in the main GUI (see II.1.6). When changing the min or max scale of a feature extractor in any of the loaded detectors, the corresponding values are automatically updated in the main GUI table (note that the "Return" key may have to be pressed for this to happen). However, the opposite is not true. II.2.11.) MSME Tab The first 5 fields determine the radius of the MSME kernel window in x-, y-, scale-, aspect-, and rotation-direction. This window size corresponds to a tolerance to small alignment changes. The current settings are optimized for a training object size of ~200 pixels (maximum of width and height). For regular scale-invariant recognition, only the first 3 size parameters are used. The aspect and rotation window sizes are only used for research purposes and are not fully functional. The subsequent check boxes are responsible for performing the correct scale normalization. In general, the parameters on this tab should be left at their default settings. II.2.12.) Reco Tab This tab contains various recognition parameters. The initial recognition threshold ("Thresh Single") specifies which maxima in the Hough voting space are kept as initial hypotheses. The optimal value for the provided codebooks is set automatically upon loading the corresponding detectors. When imaging conditions differ considerably from the training conditions, this value can be relaxed to a lower setting in order to permit more initial hypotheses to survive until the MDL verification stage. When the MDL verification is selected, the exact value of this threshold is not as important anymore, since the MDL stage is powerful enough to reject additional false positives. However, a higher value of "Thresh Single" will reduce computational load and thus speed up recognition. The two fields "Obj. Width" and "Obj. Height" determine the size of the detection bounding box that is drawn for each detection. When using one of the provided detectors, these parameters should be left unchanged. "Extend Rg." can be ignored. The fields "min Scale" and "max Scale" determine the search scale range for recognition (relative to the training scale). The default range is [0.3,1.5]. If larger objects shall be recognized, the "max Scale" value must be increased (Note that when the upper scale limit exceeds 2.0, the interest point scale range must also be adapted, see II.3.4). The "min Vote Wt." and "max Vote Wt" fields, finally, should be left unchanged. II.2.13.) MDL Tab This tab originally specified which method for hypothesis verification was used. However, since the newest version of our code permits to execute several detectors in parallel, this functionality has been transferred to the main GUI (see II.1.14). Most options on this tab can therefore be ignored. The only important remaining parameter is the "Rej. Savings<" field. This parameter defines a threshold restricting which hypotheses are passed on to the main program for hypothesis verification, already using the same score as in the later MDL stage. When loading a predefined detector, this value is initialized to its optimal setting for our test data sets. However, when imaging conditions differ considerably from the training conditions, this threshold can be adapted to a lower setting in order to allow more hypotheses to survive until the final MDL verification stage. The field "adapt for scales>" governs the scale normalization and should be left unchanged. The final option determines if a cubical or spherical/ellipsoidal MSME kernel shall be used. In our experience, both perform equally well, but the cubical kernel can be evaluated faster. II.2.14.) Misc Tab This tab only contains one relevant parameter: the checkbox "Use fast MSME" determines whether a fast approximation shall be used instead of the exact values in order to select the initial MSME starting locations. This option can be generally recommended, as it brings a considerable speedup without affecting recognition performance too much. The other parameters on this tab belong to experimental options which should be left at their default settings. II.2.15.) Misc2 Tab This tab contains several experimental options which should be left at their default settings. II.2.16.) Load Button This button can be used to load a predefined detector (detector files can be recognized by the suffix ".det"). Once the detector is fully loaded, it will be displayed in the detector table on the left side of the main program window. Note that when loading a detector, all GUI parameters will be automatically set to the saved default values for this detector. If the imaging conditions of the test data vastly differ from those used during training, it may be beneficial to adapt some of those parameters. How that is done is explained in Section I.2. II.2.17.) Save Button This button can be used to save the current detector parameters, including all codebooks. II.2.18.) Clear Button This button removes all cues loaded for this detector and clears the parameter fields. II.3.) Description of the Feature Window Interface Elements ----------------------------------------------------------- This window contains the parameters for a single feature extractor. It can be accessed by double-clicking the corresponding entry in the "Table of Cues" of either the main GUI (see II.1.6) or a detector window (see II.2.10). Note that since the loaded detectors are based on specific interest point extractors and feature descriptors, changing those settings will result in unpredictable behavior and may crash the program. The only fields which sometimes need to be adapted in this window are the "Min Scale" and "Max Scale" fields (see II.3.4). II.3.1.) Detector tab Local feature extraction can be done using either a uniform sampling scheme or different interest point detectors. For the Harris, Exact-DoG, and SURF detectors, the "Param." tab reveals a set of more detailed options ({"Sigma1", "Sigma2", "Alpha", "Thresh"}, {"Scale Octaves", "Levels/Octave", "Sigma0", "Threshold"}). In practice, they can be left unchanged. All other interest point detectors are operated at their default settings. If segmentation masks are available for the training images, the option "Use only figure area" can be used to keep only interest points which sufficiently overlap with the object (where "sufficiently" is defined by the value of "Min. Figure Pixels" in the "Params" tab. This option specifies the minimum number of object pixels relative to a 25*25 pixel patch). II.3.2.) Features tab These options determine the type of features that are computed for each interest region. Many options are available here, from simple 25*25 image patches, to SIFT, GLOH, or Shape Context features. The option "Make Rotation Invariant" can be used for all features (except "Patches") to use a rotation-invariant representation. II.3.3.) Params tab contains parameters for certain interest point operators (as explained in II.3.1.a). II.3.4.) Scale tab The "Min Scale" and "Max Scale" fields can be used to determine the scale range of the interest point detector. When training a new detector, it is often useful to restrict the scale range, e.g. to [1.9,16.0]. For later recognition, these values should however be set to a larger scale range of e.g. [1.0,32.0] (allowing recognition under scale changes of up to a factor of 2, relative to the training scale). If larger scale changes are to be tolerated during recognition, the "Max Scale" value needs to be increased accordingly (to 48.0 or 60.0 for scale factors up to 4 or 5, respectively). III. Description of the Command-Line Interface ============================================== In addition to the interaction possibilities via the GUI, the program parameters can also directly be set via the command line options. This even makes it possible to run the program entirely without GUI. USAGE: mcmatcher [OPTIONS] -nw : no gui -t T : set final recognition threshold to T (see I.2.1) -minsc S : set min detection scale to S (rel. to training scale, see I.2.4) -maxsc S : set max detection scale to S (rel. to training scale, see I.2.4) -nomdl : disable MDL hypothesis selection stage (see I.1.14a) -imagesc S: image scale factor for ground plane calculation (see II.1.15) -worldsc S: world scale factor for ground plane calculation (see II.1.15) -det FILE : load detector from FILE (can occur several times to add more detectors, see II.1.4) -img FILE : process single image from FILE (see II.1.10) -idl FILE : process a set of test images from IDL FILE (see II.1.11) -out FILE : result IDL file for output -odir DIR : result directory for detailed output -timings F: enable (F=1) or disable (F=0) timing output -q : quiet mode (no text output) -v : verbose output -vv : very verbose output Additional parameters can be made available via command line options upon request. IV. Description of the File Formats Used ======================================== In the following, we describe the different file formats that are used in our program. IV.1) IDL Format ---------------- The IDL files are used for three purposes: for specifying a list of test images to process in sequence (see II.1.11), for storing the recognition results of such a test run, and for storing the ground truth annotations of the sequence (in a different file). For each image, the file format lists a set of bounding boxes + recognition scores, separated by commas. The boxes contain upper-left and lower-right corner, but are not necessarily sorted according to this. A semicolon ends the list of bounding boxes for a single file, a period ends the file. "filename": (x1, y1, x2, y2):score, (x1, y1, x2, y2):score, ...; If there are no annotations (as in the case when just a batch list of test images shall be specified, each line just contains the file name, followed by a semicolon. "filename"; The ground truth annotation, finally, do not contain score values, so the corresponding lines boil down to the following format. "filename": (x1, y1, x2, y2), (x1, y1, x2, y2), ...; A simple Matlab reader for the IDL format is available here: http://www.vision.ee.ethz.ch/~aess/iccv2007/readIDL.m IV.2) Calibration File Format ----------------------------- Many test sets that are available from our website (http://www.vision.ee.ethz.ch/~bleibe/data/datasets.html) come with a subdirectory "maps" containing either a single calibration file "camera.default" (in the case of a static camera), or a separate calibration file "camera.XXXXX" for every frame (in the case of a moving camera). In the latter case, those calibrations were automatically obtained using the Structure-from-Motion approach by Cornelis et al., CVPR'06. Calibration files contain the calibration for one image at a time (K [3x3], rad [1x3], R [3x3], t [1x3], GP[1x4]), with K the internal calibration, rad the radial distortion coefficients, R/t external calibration, world -> camera (i.e. X_cam = R X_world + t), and GP the ground plane coordinates (in the form GP(1:3)x - GP(4)=0). For your convenience, we provide the Matlab function read_camera.m (available in the subdirectory "matlab" or from http://www.vision.ee.ethz.ch/~bleibe/data/read_camera.m), which demonstrates how to read in the camera parameters. Please note that for many datasets, we rescaled all images to twice their original size for object detection. In those cases, the calibration files still refer to this original size. Therefore, all image coordinates need to be divided by a factor of 2 prior to applying the calibration, which can be done by setting the "Image Scale" parameter to 2 (see II.1.15). For the static sequences, the world scale is already expressed in meters. IV.3) Matlab-Readable Recognition Result Format ----------------------------------------------- This kind of result file is written out to disk when processing an image list (see II.1.11) and either option II.1.12 or II.1.13 is selected. The file format is a tab-separated text matrix containing the following information, where each line corresponds to one object hypothesis. An example Matlab script for loading in result files in this format is available in the file "load_detections.m" in the "matlab/" subdirectory distributed with this archive. Column Content 1 - Image number 2 - Hypothesis number 3 - Object center, x coordinate 4 - Object center, y coordinate 5 - Object scale 6 - Object category (the first loaded detector has the label "0", etc.) 7 - Object bounding box, top left, x coordinate 8 - Object bounding box, top left, y coordinate 9 - Object bounding box, bottom right, x coordinate 10 - Object bounding box, bottom right, y coordinate 11 - Initial voting score 12 - Final MDL score 13 - Real-world distance to object footpoint (requires calibration) 14 - Real-world object height (requires calibration) 15 - Real-world object top point, x coordinate 16 - Real-world object top point, y coordinate 17 - Real-world object top point, z coordinate 18 - Real-world object footpoint, x coordinate 19 - Real-world object footpoint, y coordinate 20 - Real-world object footpoint, z coordinate 21 - Real-world object main axis direction, x coordinate 22 - Real-world object main axis direction, y coordinate 23 - Real-world object main axis direction, z coordinate IV.4) Matlab Workspace Format for Storing Result Segmentations -------------------------------------------------------------- When performing a test run over an entire list of input images, option II.1.13 can be selected to store a result segmentation for each obtained detection. This will write out a Matlab workspace for each detection, containing the following data fields: pfig - Figure probability map "pfig" pfig_xmn - x offset of pfig pfig_ymn - y offset of pfig pgnd - Ground probability map "pgnd" pgnd_xmn - x offset of pgnd pgnd_ymn - y offset of pgnd In order to save space, only the rectangular part of the pfig and pgnd maps that actually contains non-zero entries is stored in the workspace. In order to reconstruct the full maps, one therefore has to copy the stored content into an image-sized array, e.g. as follows: (imh, imw) = size(image); (h,w) = size(pfig); pfig_full = zeros(imh, imw); pfig_full(pfig_ymn:pfig_ymn+h-1, pfig_xmn:pfig_ymn+w-1) = pfig; Feb 29, 2008 Bastian Leibe