INDEX
Explanations
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
pleaſure
-0.71
deſt
-0.68
raiſ
-0.68
houſe
-0.66
ſche
-0.66
ſtre
-0.65
ſta
-0.65
ſever
-0.63
ſelf
-0.63
发表于
-0.63
POSITIVE LOGITS
.
0.52
Dare
0.48
Lis
0.48
collection
0.46
Ser
0.46
Care
0.45
Bere
0.45
s
0.44
Kle
0.44
Allen
0.44
Activations Density 0.000%