My thesis's summary

Summary of my doctor's thesis

The goal of image understanding is to construct a structured description of a scene by analyzing an image(s). However, there exists a large information gap between the input image and the scene description. To make correspondence that fill this gap, various types of data structuctures and algorithms are required:

Image Processing : mathematical transformations and filtering operations for analyzing images represented as arrays of pixels
Segmentation / Image Analysis : detection of image features from iconic image data and geometric operations for vector data representing regions and lines
Recognition / Understanding : matching and classification for recognizing objects and reasoning about the structure of a scene using symbolic descriptions of the scene and object models.

Thus, realization of image understanding systems requires a large amount of computation. The parallel processing is required to its high speed execution. In designing parallel computer systems for image understanding, the way of accommodating above varieties into a system architecture becomes a crucial problem.

One straightforward idea is to employ a heterogeneous system architecture, where different types of parallel processing modules are prepared for different types of operations. Such heterogeneous architecture, however, will meet difficulties in integrating the modules: how to share and exchange data among the modules and how to coordinate and synchronize the modules.

In this paper, we introduce a new parallel computer architecture for image understanding named Recursive Torus Architecture (RTA, in short). While RTA itself is a general parallel computer architecture for MIMD multi-microprocessor systems with distributed memories, current goal of this research is to show its practical utilities in the image understanding task, that is, to demonstrate that various types of operations from low level image processing to high level spatial reasoning can be efficiently executed on RTA, a single homogeneous architecture.

This paper first describes the hardware design of RTA/1 (with 1024 PEs (Processing Elements)), a parallel machine designed based on RTA. We developed a small-scale prototype machine with 16 PEs, RTA/0, to evaluate its performance. We propose a scheme of data level parallel processing on RTA/1 and demonstrate its utilities by implementing complex parallel processes for bottom-up object recognition on RTA/0.

In this paper, we discuss the following points.

In chapters 2 to 4, from the hardware viewpoint, we describe hardware design and evaluate the performance of RTA/1, an MIMD parallel machine with distributed memories. In chapter 2, we describe fundamental ideas of RTA, that is, its definitions and theoretical characteristics of communication distance between PEs. Moreover we have devised the coordinated parallel communication procedures to realize efficient communication between PEs. In chapter 3, we show the hardware design of RTA/1, a parallel image understanding machine. In chapter 4, we describe a small-scale prototype machine with 16 PEs, RTA/0, to evaluate its performance. Although the size of RTA/0 is very small and its absolute computation / communication speed is set slow for easy and stable implementation, various interesting and promising experimental results are obtained.
In chapters 5 and 6, from the software viewpoint, we propose a scheme of data level parallel processing on RTA/1. In general, the data level parallel processing is effective in those cases that a large amount of data are processed. In chapter 5, we define the five types of operations as fundamental operation patterns. These operations are required to realize object recognition and combinations of them realize efficient parallel processes. In chapter 6, based on the discussions and experimental results in chapter 5, we design a parallel object recognition process on RTA/1.
In chapter 7, integrating the hardware and the software, we implement a parallel object recognition process on RTA/0 and estimate the performance of RTA/1 based on its experimental results. Its evaluation leads the following findings:
- The coordinated parallel communication procedures realize efficient communication between PEs. (chapter 2)
- The combination of the fundamental operation patterns realizes efficient parallel process. (chapter 5)
Finally, in chapter 8, we show utilities of both the parallel image understanding machine RTA/1 and the design method of data level parallel processes for object recognition. We also discuss future problems.

home