INDEX
Explanations
phrases related to location or direction
references to statistical and numerical concepts
New Auto-Interp
Negative Logits
roups
-0.93
zens
-0.86
illions
-0.83
undreds
-0.78
nces
-0.76
enaries
-0.74
itures
-0.73
endars
-0.73
azines
-0.70
ousands
-0.69
POSITIVE LOGITS
antagonist
0.74
adversary
0.70
abuser
0.69
dreaded
0.69
torment
0.67
dilemma
0.67
susceptibility
0.66
intruder
0.66
perpetrator
0.65
traitor
0.64
Activations Density 0.999%