INDEX
Explanations
punctuation marks and colons that introduce lists or statements
New Auto-Interp
Negative Logits
TNT
-0.61
Reloaded
-0.59
bour
-0.55
abyss
-0.54
IZ
-0.54
ELD
-0.53
Zombies
-0.53
maid
-0.52
erville
-0.51
ages
-0.51
POSITIVE LOGITS
ividual
1.00
vote
0.73
cknow
0.73
keep
0.71
imize
0.70
listen
0.67
interfere
0.66
gradation
0.66
try
0.66
peat
0.65
Activations Density 0.424%