Open and Closed Data Sources: Documents in different formats, TV, Radio, and Information in closed legacy systems are the data sources to be mined and evaluated by CAPER. In addition to general Internet data sources, CAPER will also integrate mass media, internal LEA information systems and access to Semantic Web data collections.
Data Acquisition: Depending on the information source type, different acquisition patterns will be applied to ensure acquired information is the richest possible and has a suitable format for analysis.
Information Analysis: Each analysis module is geared towards a specific content type, i.e. Text, Image, Video, Audio and Speech or Biometric data. In addition, modules interact with the ‘Semantic mash-up’ component so as to link analysed data to Semantic Web data, in order to support evaluation of the content irrespective of its source.
Information and Reference Repositories: Both, source data when required and the information mined by the information analysis modules will be stored in these repositories, separated by content type. The CAPER lexical resources, exploited by the ‘Multilingual Text Analysis’ module will be mapped to URI based Semantic Web global entity identifiers. Repositories will also store the reference images, text, keywords, biometric data etc. of interest to the LEAs, defined during the configuration of the monitoring and analysis requests.
Interoperability and Management Application: This is the end users’ workbench. To be built on a web based collaborative platform, it will allow the Law Enforcement Officers to create and configure their monitoring requests and analysis petitions. The application will combine the necessary workflow to allow a structured collaboration between LEAs, underpinned by an organisational semantics component to ensure that their actions remain within legal and procedural limits to be modelled in WP7. Through this application, users will also be able to configure the analysis of their own internal closed information sources and control how and with whom data is being shared. In addition, the workflow element of the application will also help them serve information analysis requests from other participating LEAs.
Visual Analytics (VA) and Data Mining (DM): Grouped under the management application, the VA and DM elements are key components of the CAPER platform, since they will provide the intelligence necessary to support the output of the system. They will allow LEA users to effectively mine processed data both from Closed and Open information sources, and to further relate it to Semantic Web sources when required. The VA component will provide the visual representation of the analyses in summary and detailed forms, allowing LEA users to more easily and intuitively conceptualise the data.

Information acquisition

The CAPER project will be designed from a linguistically neutral point of view. This design methodology will allow linguistic analysis and speech recognition components for any language to be added in the future. The initial languages to be supported by CAPER are listed in WP 5.2. This list of languages has been agreed with participating LEAs. CAPER will be capable of acquiring content from multiple sources and in multiple formats and several. LEA users will provide reference images, keywords, biometric data, and define concepts to be used in information acquisition. For Mass Media capture Channel information will be required.

Information Processing

A key goal of the CAPER system will be the need to be able to make like for like comparisons between different information types (text, image, voice, etc.) and form data with different cultural biases. This goes beyond data or web mining and will require tailorable algorithms that can make a broad classification of multiple data sources or a specific piece of atomic content.

Information Exploitation

The CAPER project will contain a Visual Analytics component that will allow interactive decision making and hypotheses building. The VA work package will also include the development of mining and inference modules with a rich graphical interface for the representation and interaction evaluated information. It will allow both closed and open intelligence to be compared and contrasted. Visual Analytics has been successfully applied to the banking sector for Anti-Money Laundering Applications and Credit Card Fraud and in IP protection services to detect Trademark Abuse and Defamation.


Standardisation is essential for interchange of data and tools. Once a data format is accepted as a standard, tools can be developed and shared with little data conversion effort.
Less progress has been made however with operability of unstructured texts, e.g. multimedia data. In CAPER we will address this gap. CAPER will also standardise the processing of language sensitive information by adopting KAF8 (Knowledge Annotation Format), a multi-layered XML format for linguistic and semantic annotation of unstructured documents that has been proven to be suitable for the purpose of Information Processing. CAPER aims at extending data representation standards to also cope with multimedia and structured data.
More importantly to LEAs, WP9 of the CAPER work plan calls for a concerted effort to raise awareness and influence future policy. The results of the CAPER project and the experiences of the LEAs will be used to demonstrate to the LEA community the pros and cons of such a platform and to make concrete recommendations to policy stakeholders for future law enforcement legislation and policing procedures. WP8 and WP7 have been designed to achieve these goals by combining LEAs, Legal Specialists, Technical and Scientific partners, and a partner specialised in Technology and Law.


8KAF has been developed in the KYOTO project, an EU co-funded project (FP7 ICT Work Programme 2007 - Digital libraries and Content, Intelligent Content and Semantics objective.

Integration with Large Scale Systems

Using best of class ETL (Extraction, Transformation and Load) tools provided by partner ALTIC, the ability of the system to be reasonably integrated with legacy information systems is assured. Additional modules such as the inclusion of IBM ICU in the technology mix will provide the technologies required to enable a consistent and correct conversion of location sensitive data between system components. Finally, the CAPER system will be built on industry standard and open technologies and fitted with both a Web Services layer and traditional Java/C API interfaces.

Secure Knowledge Sharing and Collaboration

The common management and workflow application and analysts’ workbenches will be developed in compliance with prevailing secure systems standards and certification to guarantee intrinsic security. Its design will also address both national and European legal concerns. More importantly this layer will also use Semantic technologies to enhance the semantic interoperability of the system ensuring that LEA users will have a common language and set of concepts when using the system.

Legal Aspects

The project contains a specific work package in which legal, ethical and societal issues will be identified (WP7). In collaboration with participating LEAs, these will be addressed in the systems design and operational phases of the project. The system architecture foreseen is compartmentalised and will provide sufficient flexibility to react to any legal barriers that might arise during the execution of the project. The project will also produce policy and best practice recommendations based on this legal study.

User Focused

The project will be structured specifically towards the needs of the participating Law Enforcement Agencies. Two work packages exist to achieve this goal. WP2, whose goal is to allow the LEAs analyse and define their requirements, and WP8 which will allow the participating LEAs perform integration tests and field trials of the CAPER platform in partnership with the wider consortium. The project concept has in part been shaped by the fact that several members of the project consortium are drawn from different disciplines and have existing experience in data security, Open Source Intelligence, Data mining, content analysis and linguistic expertise. The CAPER project strategy will be to engage the end users at the beginning of the project. Demonstrated what technologies are available to them initially, and in cooperation with the CAPER consortium, the constituent components will be integrated and further developed to meet the LEAs’ needs.

To achieve these goals, the participating LEAs will

  • Actively participate in the requirements gathering and user modelling tasks (WP2)
  • Actively participate from M12 in the testing and operation of the prototype CAPER platform (WP8)
  • Aid the project consortium influencing national and European Policy stakeholders
  • Provide expert knowledge in challenges and obstacles to information sharing and cooperation amongst LEAs
  • Support key partners UAB and Baker McKenzie in understanding the legal frameworks in which LEAs operate
  • Jointly participate in scientific, industrial and standards initiatives during dissemination and exploitation efforts and take the lead in disseminating results to Policing Cooperation forums