INDEX
Explanations
the word "and" preceded or followed by a number, a preposition, or an article
New Auto-Interp
Negative Logits
<bos>
-0.88
Personensuche
-0.78
↵↵
-0.76
the
-0.75
"
-0.66
at
-0.66
“
-0.66
I
-0.64
-0.62
for
-0.62
POSITIVE LOGITS
Efq
1.70
myſelf
1.45
itſelf
1.41
Jefus
1.38
Reſ
1.37
himſelf
1.37
Eſ
1.36
Theſe
1.36
raiſ
1.35
Anſ
1.35
Activations Density 0.691%