Journal article Open Access

A dual approach to cluster discovery in point event data sets

Brimicombe, Allan J.

DCAT Export

<?xml version='1.0' encoding='utf-8'?>
<rdf:RDF xmlns:rdf="" xmlns:adms="" xmlns:cnt="" xmlns:dc="" xmlns:dct="" xmlns:dctype="" xmlns:dcat="" xmlns:duv="" xmlns:foaf="" xmlns:frapo="" xmlns:geo="" xmlns:gsp="" xmlns:locn="" xmlns:org="" xmlns:owl="" xmlns:prov="" xmlns:rdfs="" xmlns:schema="" xmlns:skos="" xmlns:vcard="" xmlns:wdrs="">
  <rdf:Description rdf:about="">
    <rdf:type rdf:resource=""/>
    <dct:type rdf:resource=""/>
    <dct:identifier rdf:datatype=""></dct:identifier>
    <foaf:page rdf:resource=""/>
        <rdf:type rdf:resource=""/>
        <foaf:name>Brimicombe, Allan J.</foaf:name>
        <foaf:givenName>Allan J.</foaf:givenName>
    <dct:title>A dual approach to cluster discovery in point event data sets</dct:title>
    <dct:issued rdf:datatype="">2007</dct:issued>
    <dct:issued rdf:datatype="">2007-01-01</dct:issued>
    <owl:sameAs rdf:resource=""/>
        <skos:notation rdf:datatype=""></skos:notation>
    <owl:sameAs rdf:resource=""/>
    <dct:description>Spatial data mining seeks to discover meaningful patterns in data where a prime dimension of interest is geographical location. Consideration of a spatial dimension becomes important where data either refer to specific locations and/or have significant spatial dependence which needs to be considered if meaningful patterns are to emerge. For point event data there are two main groups of approaches to identifying clusters. One stems from the statistical tradition of classification which assigns point events to a spatial segmentation. A popular method is the k-means algorithm. The other broad approach is one which searches for 'hot spots' which can be loosely defined as a localised excess of some incidence rate. Examples of this approach are GAM and kernel density estimation. This paper presents a novel variable resolution approach to 'hot spot' cluster discovery which acts to define spatial concentrations within the point event data. 'Hot spot' centroids are then used to establish additional distance variables and initial cluster centroids for a k-means classification that produces a segmentation, both spatially and by attribute. This dual approach is effective in quickly focusing on rational candidate solutions to the values of k and choice of initial candidate centroids in the k-means clustering. This is demonstrated through the analysis of a business transactions database. The overall dual approach can be used effectively to explore clusters in very large point event data sets.</dct:description>
    <dct:accessRights rdf:resource=""/>
      <dct:RightsStatement rdf:about="info:eu-repo/semantics/openAccess">
        <rdfs:label>Open Access</rdfs:label>
        <dcat:accessURL rdf:resource=""/>
Views 117
Downloads 69
Data volume 75.5 MB
Unique views 115
Unique downloads 69


Cite as