INDEX
Explanations
phrases related to social or psychological actions and states
phrases indicating social dynamics and behaviors
New Auto-Interp
Negative Logits
referring
-0.56
Sloan
-0.54
orsi
-0.53
ère
-0.53
Citation
-0.52
Bearing
-0.51
hemor
-0.50
2002
-0.50
1902
-0.50
demonstrating
-0.50
POSITIVE LOGITS
iped
0.61
natureconservancy
0.59
olitical
0.57
aband
0.57
endas
0.57
rows
0.55
blers
0.54
thood
0.54
hetics
0.53
aughs
0.52
Activations Density 1.263%