INDEX
Explanations
New Auto-Interp
Negative Logits
the
-1.62
a
-1.36
an
-1.31
both
-1.05
all
-1.05
some
-1.05
either
-1.03
another
-1.02
its
-0.99
even
-0.97
POSITIVE LOGITS
<bos>
2.03
a
0.96
'
0.95
e
0.91
i
0.88
A
0.86
p
0.84
nakalista
0.83
o
0.83
ا
0.83
Activations Density 1.167%