INDEX
Explanations
repetitive usage of the word "the" in various contexts
New Auto-Interp
Negative Logits
,
-0.47
I
-0.44
and
-0.39
.
-0.39
in
-0.38
I
-0.37
-0.37
for
-0.37
2
-0.36
failed
-0.36
POSITIVE LOGITS
ویکیپدیا
0.75
increí
0.72
ſſung
0.72
ſelves
0.69
ſei
0.69
ſelf
0.66
GIVEREF
0.66
<unused42>
0.65
<unused23>
0.64
<unused43>
0.64
Activations Density 0.498%