Reproducing co-occurrence networks in Rstudio #1081
Replies: 8 comments
-
Hello, I'm really happy to hear that KH Coder was useful for you!
"fastgreedy.community" (default) or "walktrap.community" (Random walks) function of igraph.
"layout.fruchterman.reingold" of igraph and "wordlayout" of wordcloud.
Jaccard is default. But you can choose Cosine, Dice, etc using KH Coder's interface.
Would you "Save" KH Coder's co-occurrence network as an "R Source" file and see into it? The code is not clean at all but it may help. In my experience, I had to turn off "R Diagnostics" feature of RStudio. It was too heavy for KH Coder generated codes. Also, igraph's output will changes if Igraph's version changes. So if you want exactly the same results, you should run the R that comes with KH Coder in the deps\R folder. Best, |
Beta Was this translation helpful? Give feedback.
-
Hi Mr. Higuchi Koichi, Thank you a lot for your timely reply! Your answers helped me a lot. I found the Perl script "network.pm" and managed to extract R code from it. It's pretty long, so I'll spend some time reading it. I'll try my best to comprehend, but apparently you are much more proficient and knowledgeable in R and statistics than I am, so I'll probably nudge you some time in the future again. Also, FYI, the projects I'm working on analyze corpus from several Chinese social media platforms, such as Red, Tiktok, Weibo, etc, and provide insights to help some consumer goods Japanese brands to thrive in China. You must be happy that your program is contributing to your country's brands' success oversea! Best, |
Beta Was this translation helpful? Give feedback.
-
Hi, Wow, did you get R code from Perl source code?! Yes, that will work. But just for confirmation, and for others using KH Coder, I'd like to inform that almost all plots in KH Coder can be saved as R code by pressing "Save" button. Any additional questions are welcome. Best, |
Beta Was this translation helpful? Give feedback.
-
Hi Mr. Koichi, This is really helpful. I never noticed that R source code can be saved. When I examined the source code saved from a co-occurence network, I found it starting with two matrices constructed from numeric vectors, one is "d" which looks like a word count matrix for each verbatim as an observation (I already knew how to make it), and another named "doc_length_mtr". Since those numbers are generated from my raw input somehow, I couldn't figure out what the second matrix is. It contains two columns, "length_c" and "length_w". Can you please kindly let me know how you come up with doc_length_mtr? I don't need the code, but the way to calculate behind the scene is good enough for me. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
I might just figured it out... but can you confirm?
Thank you a lot! |
Beta Was this translation helpful? Give feedback.
-
Hi, When you chose Jaccard, that matrix would not be used at all I think. So the guessing is difficult maybe... It's length of each document in words (w) and characters (c). Length in words would be used to standardize Term Frequency when you select Cosine. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the quick reply. Sorry I wasn't so specific. You are right. I was using Cosine distance to generate the chart. Feeding such a big matrix from raw numbers into Rstudio crashed the IDE all the time, so I was trying to figure out how to generate those matrices, therefore the questions above. Thank you again~ |
Beta Was this translation helpful? Give feedback.
-
After applying this setting, Rstudio stopped crashing on my environment. Best, |
Beta Was this translation helpful? Give feedback.
-
To whom it may concern,
I'm very grateful that you published such a user-friendly software for text analysis. It truely has helped me and my team built many beautiful and meaningful analysis and visualizations.
At this moment, as a moderate R user (but a heavy Python user in recent years), I'm looking to reproduce the co-occurrence networks in R, because this will give us more flexibility to nudge anything we'd like. However, after I carefully learned about the process of calculating input data required for co-occurrence network, and how to graph networks with igraph library, I found myself stuck with getting a similar graph but always a bit off.
Since I'm a data scientist, not a programmer, I can't read the Java script posted on this site for the network. I wonder if you can provide more details about the questions I have below, about generating a similar network.
Here are some crucial steps I take:
Below are some problems I just couldn't figure out and hope you can shed some lights on:
Below are some simplified codes I used to generate my graph:
`library(igraph)
library(readxl)
df_nodes <- read_excel('...\info_nodes.xlsx')
df_vertice <- read_excel('...\info_vertex.xlsx')
turning networks into igraph objects
net <- graph_from_data_frame(d=df_nodes, vertices=df_vertice, directed=F)
class(net)
keep the most significant links only
cut_edge <- quantile(df_nodes$weight, 0.95)
net.sp <- delete_edges(net, E(net)[weight<cut_edge])
view the attributes of the remaining network
edge_density(net.sp)
ecount(net.sp)
V(net)$freq
V(net)$name
trying to make subgraphs
net_clust <- cluster_fluid_communities(net, no.of.communities = 7)
V(net)$type <- net_clust$membership
plotting and coloring different subgraphs with different colors
colors <- adjustcolor( c("gray50", "tomato", "gold", "yellowgreen", "blue", "lightblue", "orange"), alpha=.6)
plot(net.sp,
edge.arrow.size=0.2, edge.curved=0.2, edge.color = 'lightblue',
vertex.size = log(V(net.sp)$freq+1)*2,
vertex.color = colors[V(net)$type], vertex.label.color = 'black',
vertex.label = V(net.sp)$name, vertex.label.cex = 0.8,
layout = layout_on_sphere(net.sp))
`
That is all I have at this time. Hope I can hear from you soon, and thank you a lot in advance for anything you share!
Sincerely and best,
Saining Zhang
Beta Was this translation helpful? Give feedback.
All reactions