INDEX
Explanations
presence of formatted text or structural elements in written content
Punctuation separating parts of sentences
gender symbols and context words
New Auto-Interp
Negative Logits
par
-0.81
ad
-0.77
in
-0.77
-0.77
re
-0.73
bu
-0.72
—
-0.71
qu
-0.71
WEBPACK
-0.71
at
-0.70
POSITIVE LOGITS
myſelf
1.67
Anſ
1.65
purpoſe
1.65
Monfieur
1.61
Jefus
1.58
pleaſure
1.56
Houſe
1.55
itſelf
1.54
auffi
1.54
Majefty
1.53
Activations Density 0.594%