INDEX
Explanations
punctuation and formatting elements in the text
New Auto-Interp
Negative Logits
ythe
-0.20
manship
-0.15
ovah
-0.15
eren
-0.15
λιά
-0.15
heid
-0.15
ë¹Ļ
-0.15
nung
-0.14
hyth
-0.14
oller
-0.14
POSITIVE LOGITS
mai
0.17
stin
0.16
evin
0.15
Maar
0.14
Waters
0.14
bald
0.14
asers
0.14
oder
0.14
inton
0.14
assy
0.14
Activations Density 0.023%