INDEX
Explanations
references to harm or negative events
New Auto-Interp
Negative Logits
lea
-0.15
agna
-0.14
ivas
-0.14
olley
-0.14
iki
-0.14
ael
-0.14
ken
-0.14
atra
-0.13
hangi
-0.13
coni
-0.13
POSITIVE LOGITS
çļĦæĺ¯
0.15
because
0.15
porte
0.15
/favicon
0.14
ãģłãģijãģ§
0.14
.Unsupported
0.14
also
0.14
ashi
0.14
اسÙħ
0.14
ark
0.14
Activations Density 0.318%