INDEX
Explanations
phrases indicating frequency or typical behavior
New Auto-Interp
Negative Logits
sprá
-0.58
braccia
-0.56
Clements
-0.53
terpisah
-0.52
Bringing
-0.51
شدند
-0.51
ней
-0.51
labios
-0.50
akt
-0.50
Building
-0.49
POSITIVE LOGITS
usually
1.91
Usually
1.81
Usually
1.77
usually
1.67
typically
1.53
Typically
1.51
Typically
1.50
normally
1.49
normally
1.46
typically
1.44
Activations Density 0.153%