INDEX
Explanations
phrases related to direct comparisons or competitions
references to physical confrontations or interactions
New Auto-Interp
Negative Logits
Polk
-0.69
tremend
-0.68
otten
-0.67
Roose
-0.66
iom
-0.66
live
-0.65
anson
-0.65
gnu
-0.64
oras
-0.63
orks
-0.62
POSITIVE LOGITS
GHz
0.69
conversations
0.69
bilingual
0.68
transsexual
0.66
interactions
0.66
ratio
0.65
ALSE
0.64
comparisons
0.63
Indonesian
0.63
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
0.63
Activations Density 0.045%