INDEX
Explanations
symbols or characters that are not standard letters or punctuation
New Auto-Interp
Negative Logits
.twitch
-0.16
(mc
-0.15
Vladim
-0.15
бÑĥдÑĮ
-0.15
IRR
-0.15
Fahr
-0.14
Qui
-0.14
lfw
-0.14
Cyr
-0.14
@Id
-0.14
POSITIVE LOGITS
Scott
0.20
Green
0.19
Thomas
0.18
Blog
0.18
Carter
0.17
King
0.17
White
0.17
Nelson
0.17
Young
0.17
Ryan
0.17
Activations Density 0.005%