INDEX
Explanations
phrases that indicate assistive actions and support
New Auto-Interp
Negative Logits
while
-0.07
whereas
-0.07
gratuitement
-0.07
arena
-0.06
nat
-0.06
ellij
-0.06
ftp
-0.06
notamment
-0.06
while
-0.06
ersen
-0.06
POSITIVE LOGITS
always
0.12
always
0.11
ALWAYS
0.10
Always
0.10
siempre
0.10
vždy
0.10
Always
0.10
вÑģегда
0.09
Äijá»ģu
0.09
sempre
0.09
Activations Density 0.044%