As highlighted in the International Principles on the Application of Human Rights to Communications Surveillance, logistical barriers to surveillance have decreased in recent decades and the application of legal principles in new technological contexts has become unclear. It is often feared that in light of the explosion of digital communications content and information about communications, or “communications metadata,” coupled with the decreasing costs of storing and mining large sets of data and the provision of personal content through third party service providers make state surveillance possible at an unprecedented scale. Communications surveillance in the modern environment encompasses the monitoring, interception, collection, preservation and retention of, interference with, or access to information that includes, reflects, arises from or is about a person’s communications in the past, present or future. There is ample evidence in the form of tenders for Internet Monitoring Systems (IMS) and Telecom Interception Systems (TCIS) put out by the Central government and various state governments that the Indian state is steadily turning into an extensive surveillance state.
While surveillance and intelligence gathering is essential for the maintenance of national security, the creation and working of a mass surveillance system as it is envisioned today may not necessarily be in absolute conformity with the existing law. A mass surveillance system like the Central Monitoring System (CMS) not only threatens to completely eradicate any vestige of the right to privacy but in the absence of a concrete set of procedural guidelines creates a tremendous risk of abuse.
Although information regarding the Central Monitoring System is quite limited on the public forum at the moment it can be gathered that a centralized system for monitoring of all communication was first proposed by the Government of India in 2009 as indicated by the Ministry of Communications & Information. Implementation of the system started subsequently and the Center for Development of Telematics (C-DOT) was entrusted with the responsibility of implementing the system. As per the C-DOT annual report 2011-12, research, development, trials and progressive scaling up of a Central Monitoring System were conducted by the organization in the past 4 years and the requisite hardware and CMS solutions which support voice and data interception have been installed and commissioned at various Telecom Service Providers (TSP) in Delhi and Haryana as part of the pilot project. Media reports indicate that the project will be fully functional by 2014. While an extensive surveillance system is being stealthily introduced by the state, several concerns with regard to its extent of use, functioning, and real world impact have been raised owing to ambiguities and wide gaps in procedure and law. Moreover, the lack of a concrete privacy legislation coupled with the absence of public discourse indicates the lack of interest of the state over the rights of an ordinary citizen.
The architecture and working of a proposed Internet Monitoring System must be examined in an attempt to better understand the functioning, capabilities and possible impact of a Central Monitoring System on our society and lives. This can perhaps allow more open discourse and a committed effort to preserve the rights of the citizens especially the right to privacy can be made while allowing for the creation of strong procedural guidelines which will help maintain legitimate intelligence gathering and surveillance.
Internet Monitoring System: Setup and Working
Very broadly, the Internet Monitoring System enables an agency of the state to intercept and monitor all content which passes through the Internet Service Provider’s (ISP) server which includes all electronic correspondence (emails, chats or IM’s, transcribed call logs), web forms, video and audio files, and other forms of internet content. The electronic data is stored and also subject to various types of analysis. While Internet Monitoring Systems are installed locally and their function is limited to specific geographic region, CMS will consolidate the data acquired from the different voice and data interception systems located across the country and create a centralized architecture for interception, monitoring and analysis of communications. Some parallels regarding the functioning of the CMS can be drawn from the the specifications revealed in the Assam Police tender document for the procurement of an Internet Monitoring System.
The deployment architecture of an Internet Monitoring System (IMS) contains probe servers which are installed at the Internet Service Provider’s (ISP) premises and the probes are installed at various tapping points within the entire ISP network. A collection server is also installed and hosted at the site of the ISP and it is used to either collect, analyze, filter or simple aggregate the data from the ISP servers and the data is transferred to a master aggregation server located a central data center. The central data center may also contain more servers specifically for analysis and storage. This type of architecture is being referred to as a ‘high availability clustered setup’ which is supposed to provide security in case of a failure or outage.
The Assam Police Internet Monitoring System tender document specifically indicates that the deployment in the state of Assam shall require 8 taps or probes to be installed at different ISPs, out of which 6 taps/probes shall be of 10 GBPS and 2 taps are of 1 GBPS. The document however mentions that the specifications are preliminary and subject to change.
Types of data
The proposed internet monitoring system of the Assam state can provide network traffic interception and a variety of internet protocols including Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), Internet Message Access Protocol (IMAP) and Session Initiation Protocol (SIP), Voice over Internet Protocol (VoIP) that can be intercepted and monitored and also support monitoring of Internet Relay Chat and various other messaging applications (such as Google Talk, Yahoo Chat, MSN Messenger, ICQ, etc.). The system can be equipped to capture and display multiple file types like text (.doc, .pdf), zipped (.zip) and executable applications (.exe). Further, information regarding login details, login pattern, login location, DNS address, routing address can be acquired along with the IP address and other details of the user.
Web crawling capabilities can be installed on the system which can provide data from various data sources like social networking sites, web based communities, wikis, blogs and other forms of web content on hosted applications can also be intercepted, monitored and analyzed. The system also allows capture of additional pages if updated; log periodical updates and other changes allowing monitoring agencies the capability of gathering internet traffic based on several parameters like Protocols, Keywords, Filters and Watch lists which is achieved by including phonetically similar words in various languages including local languages.
More specific functions of the IMS can include complete email extraction which will disclose the address book, inbox, sent mail folder, drafts folder, personal folders, delete folders, custom folders etc. and can also provide identification of dead drop mails. and allow country wise tracking of instant messages, chats and mails.
Regarding retention and storage of data, the tender document specifies that the system shall be technically capable of retaining the metadata of Internet traffic for at least one year and the defined traffic/payload/content is to be retained in the storage server at least for a week. However, the data may be retained for a longer period if required. The metadata and qualified data after analysis are integrated to a designated main intelligence repository for storage.
Types of Analysis
The Internet Monitoring System apart from intercepting all the data generated through the Internet Service Providers is essentially equipped for various types of data analysis. The solutions that are installed in the internet monitoring system provide the capability for real time as well as historical analysis of network traffic, network perimeter devices and internal sniffers. The kinds of analysis based on ‘slicing and dicing of data’ range from text mining, sentiment analysis, link analysis, geo-spatial analysis, statistical analysis, social network analysis, transaction analysis, location analysis and fusion based analysis, CDR analysis, timeline analysis and histogram based analysis from various sources.
The solutions installed in the IMS can monitor specific words or phrases (in various languages) in blogs, websites, forums, media reports, social media websites, media reports, chat rooms and messaging applications, collaboration applications and deep web applications. Phone numbers, addresses, names, locations, age, gender and other such information from content including comments can also be monitored. Social media profiles and information related to it can be extracted and a detailed ontology of all the social media profiles of the user can be created.
Based on the information, the analysis is supposed to provide the capability to identify suspicious behavior based on existing and new patterns as they emerge and are continuously applied to combine incoming and existing information on people, profiles, transactions, social network, type of websites visited, time spent on websites, type of content download or view and any other type of information that can be gathered. The solutions are also supposed to create single or multiple or parallel scenario build-ups that may occur in blogs, social media forums, chat rooms, specific web hosting server locations or URL, packet route that may be defined from time to time and such scenario build-ups can be based on parameters like sentiments, language or expressions purporting hatred or anti-national expressions, and even emotions which as may be defined by the agency depending on operational and intelligence requirement. Based on these parameters, automated alerts can be generated relating to structured or unstructured data (including metadata of contents), events, pattern discovery, phonetically similar words or phrases or actions from users.
Based on the data analysis, reports or dossiers can be generated and visual analysis allowing a wide variety of views can be created. Further, real time visualization showing results from real-time data can be generated which allows alerts, alert categories or discoveries to be ranked (high, medium, and low priority, high value asset, low value asset, moderate value asset, verified information, unverified information, primary evidence, secondary evidence, circumstantial evidence, etc.) based on criteria developed by the agency. The IMS solutions can also be capable of offering web-intelligence and open source intelligence and allow capabilities like simultaneous search capabilities.
Another important requirement mentioned in the tender document is the systems capability to integrate with other interception and monitoring systems for 2G, 3G/UMTS and other evolving mobile carrier technologies including fixed line and Blackberry services and encrypted IP services like Skype services.
A version of this post was published here.
(c) Centre for Internet and Society
The Centre for Internet and Society is a non-profit research organization that works on policy issues relating to freedom of expression, privacy, accessibility for persons with disabilities, access to knowledge and IPR reform, and openness (including open government, FOSS, open standards, etc.), and engages in academic research on digital natives and digital humanities.