INDEX
Explanations
instances of character names and titles
New Auto-Interp
Negative Logits
pleaſure
-0.90
ſtand
-0.84
houſe
-0.81
ſtate
-0.81
purpoſe
-0.80
Majefty
-0.80
Jefus
-0.79
deſt
-0.77
itſelf
-0.77
uſed
-0.76
POSITIVE LOGITS
@
0.75
Mr
0.74
Mr
0.62
Z
0.57
j
0.56
J
0.55
z
0.52
G
0.52
Ar
0.51
Dr
0.50
Activations Density 0.174%