INDEX
Explanations
punctuation marks and formatting cues in the text
New Auto-Interp
Negative Logits
-0.64
O
-0.58
-
-0.56
sk
-0.54
z
-0.54
tra
-0.52
mor
-0.51
Sch
-0.51
I
-0.51
ST
-0.50
POSITIVE LOGITS
pleaſure
1.31
GEBURTSDATUM
1.19
purpoſe
1.14
houſe
1.14
myſelf
1.13
ſelf
1.12
anſ
1.10
raiſ
1.10
diſt
1.10
UnsafeEnabled
1.08
Activations Density 0.301%