INDEX
Explanations
phrases related to instructions or guidance for actions
New Auto-Interp
Negative Logits
HING
-0.14
uder
-0.14
šet
-0.14
'gc
-0.14
šk
-0.14
yg
-0.13
endas
-0.13
ihan
-0.13
oli
-0.13
agne
-0.13
POSITIVE LOGITS
two
0.57
three
0.56
several
0.45
two
0.43
three
0.42
four
0.38
trois
0.38
两
0.37
drei
0.36
certain
0.36
Activations Density 0.488%