|
|
|
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}
PARALLEL AND DISTRIBUTED DATA MINING
The need for high-performance DM techniques grows as the
PDM
In this view, PDM has its central target in the exploitation of massive and possibly fine-grained parallelism, paying closer attention to work synchronization and load balancing, and exploiting high-performance I/O subsystems where available. PDM applications deal with large and hard problems, and they are typically designed for
By contrast, DDM techniques use a coarser computation grain and loose hypotheses on interconnection networks. DDM techniques are often
Integration of Parallel Tools into Data Mining Environments
It is now recognized that a crucial issue in the effectiveness of DM tools is the degree of interoperability with conventional databases, data warehouses and OLAP services. Maniatty and Zaki (2000) state several requirements for parallel DM systems, and the issues related to the integration are clearly
Pushing more of the computational effort into the data management support means exploiting the internal parallelism of modern database servers. On the other hand, scalability of such servers to massive parallelism is still a matter of research. While integration solutions are now emerging for sequential DM, this is not yet the case for parallel algorithms.
The bandwidth of I/O subsystems in parallel architectures is theoretically much higher than that of sequential ones, but a conventional file system or DBMS interface cannot easily exploit it. We need to use new software supports that are still far from being standards, and sometimes are architecture specific. Parallel file systems, high-performance interfaces to parallel database servers are important resources to exploit for PDM. DDM must also take into account remote data servers, data transport
|
|||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||