INDEX
Explanations
capital letters followed by a 'K'
New Auto-Interp
Negative Logits
andon
-0.16
rott
-0.15
iego
-0.15
Contrast
-0.14
umann
-0.14
unas
-0.14
Cas
-0.14
μÏĢο
-0.14
qu
-0.14
ium
-0.14
POSITIVE LOGITS
K
0.20
oen
0.17
oko
0.14
AKE
0.14
LM
0.14
k
0.14
atsu
0.14
inks
0.14
oser
0.14
kus
0.14
Activations Density 0.084%