INDEX
Explanations
phrases indicating pressure or requests from external sources
the word "from" indicating sources or origins of information
New Auto-Interp
Negative Logits
merce
-0.83
hene
-0.71
omial
-0.70
imate
-0.68
mone
-0.68
mix
-0.67
isode
-0.66
nets
-0.65
isec
-0.65
idy
-0.65
POSITIVE LOGITS
afar
1.57
abroad
1.27
within
1.12
inside
1.04
elsewhere
1.04
superiors
1.03
outside
1.03
constituents
1.01
outsiders
0.99
strangers
0.93
Activations Density 0.144%