INDEX
Explanations
references to the word "it" in various forms
New Auto-Interp
Negative Logits
houſe
-1.55
Houſe
-1.52
Monfieur
-1.52
Majefty
-1.48
pleaſure
-1.48
Diſ
-1.39
Conſ
-1.38
ſever
-1.35
myſelf
-1.35
Efq
-1.34
POSITIVE LOGITS
0.91
is
0.90
It
0.89
it
0.85
"
0.77
I
0.76
was
0.75
'
0.73
p
0.71
its
0.68
Activations Density 0.004%