INDEX
Explanations
names of personalities or public figures
vowel-heavy words
New Auto-Interp
Negative Logits
Reloaded
-0.80
srfAttach
-0.77
CLASSIFIED
-0.74
ADE
-0.71
DERR
-0.70
Sharp
-0.69
withd
-0.67
ãĥķãĤ©
-0.66
ENDED
-0.66
ãĥ¯ãĥ³
-0.64
POSITIVE LOGITS
ghan
1.09
pless
0.99
ghai
0.96
pling
0.91
vel
0.91
lder
0.90
vern
0.90
ju
0.88
veland
0.86
ze
0.86
Activations Density 0.121%