INDEX
Explanations
words related to financial amounts and toys
New Auto-Interp
Negative Logits
itſelf
-1.58
Houſe
-1.58
myſelf
-1.57
Majefty
-1.50
houſe
-1.50
Theſe
-1.45
ſelves
-1.42
pleaſure
-1.41
Jefus
-1.41
Anſ
-1.40
POSITIVE LOGITS
&
0.98
"
0.85
0.84
0.83
in
0.82
↵
0.78
or
0.75
and
0.74
et
0.69
on
0.68
Activations Density 0.146%