INDEX
Explanations
negations and phrases indicating something does not occur or is not true
New Auto-Interp
Negative Logits
increí
-0.90
Geiſt
-0.87
dieſe
-0.86
Geſch
-0.85
يتيمه
-0.85
ſeine
-0.83
müſſen
-0.82
miniaturka
-0.80
témoig
-0.80
zuſammen
-0.79
POSITIVE LOGITS
.
0.67
↵
0.66
0.63
<bos>
0.60
↵↵
0.56
"
0.56
,
0.54
(
0.51
0.47
:
0.47
Activations Density 0.241%