INDEX
Explanations
language related to retaliation and aggressive actions
New Auto-Interp
Negative Logits
ervo
-0.15
OVERRIDE
-0.14
anki
-0.14
okud
-0.14
ütün
-0.14
unan
-0.13
ica
-0.13
YNAM
-0.13
ImageContext
-0.13
isÃŃ
-0.13
POSITIVE LOGITS
against
0.38
against
0.31
Against
0.29
Against
0.27
for
0.24
æİªæĸ½
0.21
gegen
0.19
measures
0.19
tegen
0.19
Measures
0.19
Activations Density 0.026%