INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
oÄį
-0.18
unlike
-0.15
ãĤ¡
-0.15
mdir
-0.14
antom
-0.13
ogn
-0.13
evin
-0.13
annis
-0.13
utin
-0.13
974
-0.13
POSITIVE LOGITS
same
1.10
same
1.02
Same
0.93
Same
0.91
SAME
0.82
åIJĮ
0.79
_same
0.77
mismo
0.75
mesma
0.71
缸åIJĮ
0.71
Activations Density 0.470%