INDEX
Explanations
mathematical notation involving bracket or brace symbols
New Auto-Interp
Negative Logits
o
-0.89
ा
-0.70
to
-0.68
-0.68
-0.67
op
-0.63
DC
-0.63
(
-0.62
Mendez
-0.62
la
-0.62
POSITIVE LOGITS
Jefus
1.48
myſelf
1.44
themſelves
1.38
himſelf
1.35
ſeveral
1.32
itſelf
1.32
purpoſe
1.30
Efq
1.27
poffible
1.27
pleaſure
1.27
Activations Density 0.007%