Atiia, Ali A. ; Hopper, Corbin; Inoue, Katsumi; Vidal, Silvia; Waldispühl, Jérôme
Virtually all molecular interaction networks (MINs), irrespective of organism or physiological context, have a majority of looselyconnected ‘leaf’ genes interacting with at most 13 genes, and a minority of highlyconnected ‘hub’ genes interacting with at least 10 or more other genes. Previous reports proposed adaptive and nonadaptive hypotheses describing sufficient but not necessary conditions for the origin of this majorityleaves minorityhubs (mLmH) topology. We modelled the evolution of MINs as a computational optimization problem which describes the cost of conserving, deleting or mutating existing genes so as to maximize (minimize) the overall number of beneficial (damaging) interactions networkwide. The model 1) provides sufficient and, assuming
$\mathcal {P}\neq \mathcal {NP}$
, necessary conditions for the emergence of mLmH as an adaptation to circumvent computational intractability, 2) predicts the percentage number of genes having d interacting partners, and 3) when employed as a fitness function in an evolutionary algorithm, produces mLmHpossessing synthetic networks whose degree distributions match those of equalsize MINs.
Masias, Victor H.; Hecking, Tobias; Crespo, Fernando; Hoppe, H. Ulrich
1 Citations
This paper proposes a methodological approach to explore the ability to detect social media users based on pedestrian networks and neighborhood attributes. We propose the use of a detection function belonging to the Spatial Capture–Recapture (SCR) which is a powerful analytical approach for detecting and estimating the abundance of biological populations. To test our approach, we created a set of proxy measures for the importance of pedestrian streets as well as neighborhood attributes. The importance of pedestrian streets was measured by centrality indicators. Additionally, proxy measures of neighborhood attributes were created using multivariate analysis of census data. A series of candidate models were tested to determine which attributes are most important for detecting social media users. The results of the analysis provide information on which attributes of the city have promising potential for detecting social media users. Finally, the main results and findings, limitations and extended use of the proposed methodological approach are discussed.
Tupikina, Liubov ; Grebenkov, Denis S.
1 Citations
A heterogeneous continuous time random walk is an analytical formalism for studying and modeling diffusion processes in heterogeneous structures on microscopic and macroscopic scales. In this paper we study both analytically and numerically the effects of structural and temporal heterogeneities onto the diffusive dynamics on different types of networks. For this purpose we investigate how the distribution of the first passage time is affected by the global topological network properties and heterogeneities in the distributions of the travel times. In particular, we analyze transport properties of random networks and define network measures based on the firstpassage characteristics. The heterogeneous continuous time random walk framework, presented in the paper, has potential applications in biology, social and urban science, search of optimal transport properties, analysis of the effects of heterogeneities or bursts in transportation networks.
Fushimi, Takayasu ; Saito, Kazumi; Ikeda, Tetsuo; Kazama, Kazuhiro
1 Citations
Many networks including spatial networks, social networks, and web networks, are not deterministic but probabilistic due to the uncertainty of link existence. From networks
with such uncertainty, to extract densely connected nodes, we propose connectedness centrality and its extended version, group connectedness centrality, where the connectedness of each node is defined as the expected size of its connected component over all possible graphs produced by an uncertain graph. In a largescale network, however, since the number of combinations of possible graphs is enormous, it is difficult to strictly calculate the expected value. Therefore, we also propose an efficient estimation method based on Monte Carlo sampling. When applying our method to road networks, the extracted nodes can be regarded as candidate sites of evacuation facilities that many residents can reach even in the situation where roads are stochastically blocked by natural disasters. In our experimental evaluations using actual road networks, we show the following promising characteristics: our proposed method 1) works stably with respect to the number of simulations; 2) extracts nodes set reachable from more nodes even in a situation that many links are deleted; and 3) computes much more efficient, compared to existing centrality measures and community extraction methods.
Koponen, Ismo T. ; Nousiainen, Maija
11 Citations
Concept maps, which are networklike visualisations of the interlinkages between concepts, are used in teaching and learning as representations of students’ understanding of conceptual knowledge and its relational structure. In science education, research on the uses of concept maps has focused much attention on finding methods to identify key concepts that are of the most importance either in supporting or being supported by other concepts in the network. Here we propose a method based on network analysis to examine students’ representations of the relational structure of physics concepts in the form of concept maps. We suggest how the key concepts and their epistemic support can be identified through focusing on the pathways along which the information is passed from one node to another. Towards this end, concept maps are analysed as directed and weighted networks, where nodes are concepts and links represent different types of connections between concepts, and where each link is assumed to provide epistemic support to the node it is connected to. The notion of key concept can then be operationalised through the directed flow of information from one node to another in terms of communicability between the nodes, separately for outgoing and incoming weighted links. Here we analyse a collated concept network based on a sample of 12 original concept maps constructed by university students. We show that communicability is a simple and reliable way to identify the key concepts and examine their epistemic justification within the collated network. The communicabilities of the key nodes in the collated network are compared with communicabilities averaged over the set of 12 individual concept maps. The comparison shows the collated network contains an extensive set of key concepts with good epistemic support. Every individual networks contain a subset of these key concepts but with a limited overlap of the subsets with other individual networks. The epistemically well substantiated knowledge is thus sparsely distributed over the 12 individual networks.
Vasiliauskaite, Vaiva; Evans, Tim S.
1 Citations
In this work we give a community detection algorithm in which the communities both respects the intrinsic order of a directed acyclic graph and also finds similar nodes. We take inspiration from classic similarity measures of bibliometrics, used to assess how similar two publications are, based on their relative citation patterns. We study the algorithm’s performance and antichain properties in artificial models and in real networks, such as citation graphs and food webs. We show how well this partitioning algorithm distinguishes and groups together nodes of the same origin (in a citation network, the origin is a topic or a research field). We make the comparison between our partitioning algorithm and standard hierarchical layering tools as well as community detection methods. We show that our algorithm produces different communities from standard layering algorithms.
de AndaJáuregui, Guillermo; AlcaláCorona, Sergio Antonio; EspinalEnríquez, Jesús; HernándezLemus, Enrique
Transcriptional coexpression networks represent the concerted gene regulation programs by means of statistical inference of coexpression patterns. The rich phenomenology of transcriptional processes behind complex phenotypes such as cancer, is often captured (at least partially) in the connectivity structure of transcriptional coexpression networks. By analyzing the community structure of these networks, we may develop a deeper understanding of that phenomenology. We identified the modular structure of a transcriptional coexpression network obtained from breast cancer gene expression as well as a noncancer adjacent breast tissue network as a control. We then analyzed the biological functions associated to the resulting communities by means of enrichment analysis. We also generated two projected networks for both, tumor and control networks: The first one is a projection to a network in which nodes are communities and edges represent topologically adjacent communities, indicating coexpression patterns between them. For the second projection, a bipartite network was generated containing a layer of modules and a layer of biological processes, with links between modules and the functions in which they are enriched; from this bipartite network, a projection to the community layer was obtained. From the analysis of the communities and projections, we were able to discern distinctive patterns of regulation between tumors and controls. Even though the connectivity structure of transcriptional coexpression networks is quite different, the topology of the projected networks is somehow similar, indicating functional compartmentalization, in both tumor and control conditions. However, the biological functions represented in the corresponding modules resulted notably different, with the tumor network comprising functional modules enriched for wellknown hallmarks of cancer.
Haruna, Taichi
New betweenness centralities of nodes in a directed network are proposed based on the idea that nodes in a network are processes rather than things. They are called input and output betweenness centralities. They measure importance of nodes as input and output for gluing arcs together as interface between processes, respectively. We demonstrate their use and discuss their meaning by calculating them in two toy directed networks and one realworld network. We also compare them with the existing centrality measures that reflect asymmetry of links in directed networks: out and indegrees and Hub and Authority scores. We found that input and output betweenness centralities behave differently from these measures in some nodes. It is suggested that they can effectively identify nodes that are less important in terms of existing measures but are noteworthy from the viewpoint that nodes are processes.
Taheri, Aynaz; Gimpel, Kevin; BergerWolf, Tanya
1 Citations
We propose sequencetosequence architectures for graph representation learning in both supervised and unsupervised regimes. Our methods use recurrent neural networks to encode and decode information from graphstructured data. Recurrent neural networks require sequences, so we choose several methods of traversing graphs using different types of substructures with various levels of granularity to generate sequences of nodes for encoding. Our unsupervised approaches leverage long shortterm memory (LSTM) encoderdecoder models to embed the graph sequences into a continuous vector space. We then represent a graph by aggregating its graph sequence representations. Our supervised architecture uses an attention mechanism to collect information from the neighborhood of a sequence. The attention module enriches our model in order to focus on the subgraphs that are crucial for the purpose of a graph classification task. We demonstrate the effectiveness of our approaches by showing improvements over the existing stateoftheart approaches on several graph classification tasks.
Singh, Kushal Veer; Vig, Lovekesh
3 Citations
Interactomes such as Protein interaction networks have many undiscovered links between entities. Experimental verification of every link in these networks is prohibitively expensive, and therefore computational methods to direct the search for possible links are of great value. The problem of finding undiscovered links in a network is also referred to as the link prediction problem. A popular approach for link prediction has been to formulate it as a binary classification problem in which class labels indicate the existence or absence of a link (we refer to these as positive links or negative links respectively) between a pair of nodes in the network. Researchers have successfully applied such supervised classification techniques to determine the presence of links in protein interaction networks. However, it is quite common for proteinprotein interaction (PPI) networks to have a large proportion of undiscovered links. Thus, a link prediction approach could incorrectly treat undiscovered positive links as negative links, thereby introducing a bias in the learning. In this paper, we propose to denoise the class of negative links in the training data via a Gaussian process anomaly detector. We show that this significantly reduces the noise due to mislabelled negative links and improves the resulting link prediction accuracy. We evaluate the approach by introducing synthetic noise into the PPI networks and measuring how accurately we can reconstruct the original PPI networks using classifiers trained on both noisy and denoised data. Experiments were performed with five different PPI network datasets and the results indicate a significant reduction in bias due to label noise, and more importantly, a significant improvement in the accuracy of detecting missing links via classification.
