INDEX
Explanations
phrases indicating comparison or relationships between entities
New Auto-Interp
Negative Logits
orem
-0.19
ik
-0.16
exactly
-0.15
iland
-0.14
an
-0.14
indeed
-0.14
CLUDING
-0.14
exact
-0.14
chg
-0.13
Policies
-0.13
POSITIVE LOGITS
other
0.17
other
0.17
лиÑĨ
0.16
Cruc
0.16
êu
0.15
ساÛĮر
0.15
'autres
0.14
tual
0.14
altre
0.14
EEK
0.14
Activations Density 0.026%