INDEX
Explanations
the presence of specific place names or references to geographic locations
New Auto-Interp
Negative Logits
ending
-0.21
ause
-0.17
ete
-0.17
atty
-0.17
acks
-0.16
ubic
-0.16
illow
-0.16
aper
-0.16
otify
-0.16
izza
-0.16
POSITIVE LOGITS
eking
0.18
aph
0.17
omez
0.16
utenberg
0.16
annon
0.15
ley
0.15
adel
0.15
Snape
0.15
azard
0.15
lobs
0.14
Activations Density 0.044%