INDEX
Explanations
the presence of the word "any"
occurrences of the word "any"
New Auto-Interp
Negative Logits
rex
-0.76
plex
-0.72
seless
-0.70
ean
-0.69
gal
-0.68
ward
-0.67
itton
-0.66
iano
-0.66
visor
-0.66
gif
-0.66
POSITIVE LOGITS
THING
1.15
particular
0.94
ones
0.92
whatsoever
0.90
significant
0.90
WHERE
0.89
longer
0.89
ONE
0.83
meaningful
0.83
substantive
0.80
Activations Density 0.070%