INDEX
Explanations
introductory phrases that emphasize a point or transition in thought
New Auto-Interp
Negative Logits
kaar
-0.19
-League
-0.17
udeau
-0.17
niÄį
-0.16
abad
-0.16
ungan
-0.16
Král
-0.15
ruc
-0.15
prung
-0.15
poons
-0.14
POSITIVE LOGITS
far
0.21
-called
0.20
iled
0.20
ething
0.18
iling
0.18
far
0.17
ber
0.17
aking
0.17
fter
0.16
jaw
0.16
Activations Density 0.060%