INDEX
Explanations
quotes and dialogue within the text
New Auto-Interp
Negative Logits
lover
-0.16
fold
-0.16
inth
-0.16
aginator
-0.16
fold
-0.16
andler
-0.15
Fold
-0.15
avan
-0.14
mium
-0.14
)((((
-0.14
POSITIVE LOGITS
βάλ
0.15
Patt
0.15
cheng
0.14
dit
0.14
olley
0.14
ir
0.14
ostel
0.13
iets
0.13
iciel
0.13
\Unit
0.13
Activations Density 0.016%