INDEX
Explanations
things that are followed by citations, examples or code
New Auto-Interp
Negative Logits
myſelf
-1.07
Efq
-1.02
Monfieur
-1.00
raiſ
-0.97
houſe
-0.97
themſelves
-0.97
whoſe
-0.94
ſever
-0.94
vPvB
-0.94
tvguidetime
-0.93
POSITIVE LOGITS
,
0.85
they
0.56
we
0.54
it
0.53
main
0.51
.
0.49
she
0.47
Min
0.47
:
0.47
;
0.47
Activations Density 0.599%