INDEX
Explanations
references to subsections and numbered lists within the text
New Auto-Interp
Negative Logits
-0.71
(
-0.71
var
-0.67
//
-0.62
gu
-0.61
var
-0.60
*
-0.59
الع
-0.58
Gu
-0.57
сс
-0.57
POSITIVE LOGITS
subsection
2.51
subsubsection
2.04
himſelf
1.42
itſelf
1.40
myſelf
1.40
BibitemShut
1.26
་་
1.24
―――――
1.23
autorytatywna
1.20
Jefus
1.20
Activations Density 0.033%