INDEX
Explanations
words related to lists or sequential information
the presence of specific repeated characters or symbols
New Auto-Interp
Negative Logits
diminishing
-0.67
chunks
-0.66
ignition
-0.65
fading
-0.65
suicide
-0.65
scattering
-0.64
hydrogen
-0.62
slowing
-0.61
Mayer
-0.60
literacy
-0.60
POSITIVE LOGITS
agree
1.08
mand
1.05
ï¸ı
1.04
have
1.02
need
0.99
want
0.97
know
0.97
require
0.96
intend
0.95
alm
0.94
Activations Density 0.123%