INDEX
Explanations
New Auto-Interp
Negative Logits
ſeveral
-0.69
Majefty
-0.68
belec
-0.67
Sopho
-0.62
Suivez
-0.60
diſt
-0.60
Eccle
-0.59
Zeno
-0.59
pilgri
-0.58
lück
-0.58
POSITIVE LOGITS
<bos>
0.80
“
0.68
‘
0.58
“
0.57
the
0.53
مصادر
0.50
c
0.50
’
0.48
not
0.48
endregion
0.48
Activations Density 0.860%