INDEX
Explanations
names of authors and years in citations
New Auto-Interp
Negative Logits
pride
-0.79
Confederate
-0.66
patriotic
-0.65
surfing
-0.65
wasteland
-0.64
vanity
-0.64
Loaded
-0.64
cowboy
-0.64
Destination
-0.63
Flash
-0.63
POSITIVE LOGITS
et
1.45
owsky
1.17
opoulos
1.10
elli
1.09
itsch
1.06
oglu
1.05
gaard
1.05
asio
1.03
sson
1.02
insky
1.01
Activations Density 0.200%