INDEX
Explanations
code snippets, file names, and directory names
code and logs
New Auto-Interp
Negative Logits
itſelf
-0.90
$_"
-0.89
crdi
-0.88
pleaſure
-0.86
raiſ
-0.86
uſed
-0.86
elettrica
-0.85
myſelf
-0.84
―――――
-0.82
Jefus
-0.81
POSITIVE LOGITS
.
0.57
,
0.54
(
0.49
super
0.48
a
0.47
-
0.46
“
0.45
o
0.44
et
0.43
"
0.43
Activations Density 7.529%