INDEX
Explanations
reflexive verbs or actions involving self-reference
New Auto-Interp
Negative Logits
actualité
-0.71
voorbeeld
-0.70
Majefty
-0.67
Prist
-0.67
Yud
-0.61
ángulo
-0.60
Pois
-0.60
Ruff
-0.59
flavor
-0.59
flavors
-0.57
POSITIVE LOGITS
se
1.09
haberse
0.85
се
0.79
"},
0.74
ית
0.73
się
0.72
']==
0.70
נ
0.70
يتم
0.70
si
0.69
Activations Density 0.058%