INDEX
Explanations
phrases emphasizing directionality and location
New Auto-Interp
Negative Logits
africaine
-0.39
africain
-0.39
بان
-0.35
объектов
-0.35
chines
-0.34
Mancha
-0.34
대해
-0.34
ailleurs
-0.34
ár
-0.34
ofern
-0.34
POSITIVE LOGITS
RIGHT
0.78
Directly
0.77
right
0.76
directly
0.74
directly
0.71
Right
0.71
addGap
0.69
direttamente
0.67
непосредственно
0.67
Immediately
0.66
Activations Density 0.255%