INDEX
Explanations
phrases that indicate guidance or instruction
New Auto-Interp
Negative Logits
p
-0.16
chner
-0.15
m
-0.15
D
-0.15
itud
-0.15
something
-0.15
atos
-0.14
ová
-0.14
aug
-0.14
atas
-0.14
POSITIVE LOGITS
akat
0.17
means
0.17
elik
0.16
(By
0.16
rek
0.15
-by
0.15
(by
0.15
by
0.15
Yourself
0.15
elor
0.14
Activations Density 0.104%