INDEX
Explanations
mentions of architectural features and spatial arrangements
New Auto-Interp
Negative Logits
-has
-0.15
yster
-0.15
ibu
-0.15
olmadıģını
-0.14
’Ñıз
-0.14
ÙĨدارد
-0.14
ÙĨد
-0.14
iÄħ
-0.14
hadn
-0.14
ä¸įä¼ļ
-0.13
POSITIVE LOGITS
are
0.53
çļĦæĺ¯
0.35
were
0.34
lies
0.32
_are
0.32
is
0.31
there
0.30
lie
0.30
Are
0.29
estão
0.29
Activations Density 0.255%