INDEX
Explanations
sentences with direct quotes
punctuation marks and various forms of written expression
New Auto-Interp
Negative Logits
atos
-0.77
agos
-0.70
cipled
-0.69
shootout
-0.67
III
-0.65
rio
-0.65
oteric
-0.64
ciples
-0.63
trucks
-0.61
rifles
-0.61
POSITIVE LOGITS
she
1.93
she
1.91
She
1.89
She
1.78
her
1.76
hers
1.73
SHE
1.70
Her
1.64
Ms
1.50
Her
1.48
Activations Density 1.339%