INDEX
Explanations
the presence of various types of parentheses and quotation marks in the text
New Auto-Interp
Negative Logits
eger
-0.57
hooter
-0.55
__.__
-0.54
iedler
-0.54
tem
-0.52
Starr
-0.52
Spiel
-0.51
ništ
-0.51
Nath
-0.49
Seidel
-0.48
POSITIVE LOGITS
Majefty
0.82
Cæsar
0.80
Efq
0.79
"\<
0.74
kubwa
0.73
erçe
0.72
fhort
0.71
Shakspeare
0.71
ंदीखरीदारी
0.70
itſelf
0.70
Activations Density 0.019%