INDEX
Explanations
references to scientific notation or citations in a document
New Auto-Interp
Negative Logits
"
-0.77
in
-0.74
on
-0.70
to
-0.69
"
-0.68
a
-0.67
E
-0.66
-
-0.66
}
-0.65
.
-0.64
POSITIVE LOGITS
myſelf
1.38
(\<
1.33
(£
1.32
^(@
1.30
(§
1.28
(€
1.24
(°
1.23
pleaſure
1.23
$(-
1.22
((*
1.22
Activations Density 0.526%