===== Presentation ===== This dataset have been downloaded from http://www.cs.umd.edu/projects/linqs/projects/lbc/ author: Clément Grimal http://membres-lig.imag.fr/grimal/ Questions, suggestions or comments are appreciated! date: February, 2012 ===== Description ===== The archive contains 2708 documents over the 7 labels (Neural_Networks,Rule_Learning,Reinforcement_Learning,Probabilistic_Methods,Theory,Genetic_Algorithms,Case_Based). It is made of 4 views (content,inbound,outbound,cites) on the same documents. The documents are described by 1433 words in the content view, and by the 5429 links between them in the inbound, outbound and cites views. ===== Files ===== All the files are encoded in UTF8. cora_content.mtx -- the documents-words matrix, containing 0/1 values indicating absence/presence of a word in a document, in the Matrix Market coordinate format (sparse). cora_inbound.mtx -- the matrix indicating by 0/1 values the inbound links between documents, in the Matrix Market coordinate format (sparse). cora_outbound.mtx -- the matrix indicating by 0/1 values the outbound links between documents, in the Matrix Market coordinate format (sparse). It is the transpose of cora_inbound.mtx. cora_cites.mtx -- the matrix of the number of citation links between documents, in the Matrix Market coordinate format (sparse). It is the sum of cora_inbound.mtx and cora_outbound.mtx. documents-mapping.txt -- the mapping between the rows of the matrix and the id of the document in the original collection. cora.txt -- contains the list of the affectations of the documents to a topic. labels.txt -- contains the list of the different labels, in the order of the affectations found in cora_act.txt