INDEX
Explanations
words associated with drama and intensity in narratives
New Auto-Interp
Negative Logits
latter
-0.20
ie
-0.17
bie
-0.16
bil
-0.15
ieg
-0.15
ies
-0.15
auer
-0.15
oo
-0.15
bag
-0.15
edom
-0.14
POSITIVE LOGITS
íĭ±
0.20
atically
0.19
atic
0.18
-document
0.18
queen
0.18
/com
0.16
queens
0.16
llama
0.16
usp
0.15
s
0.15
Activations Density 0.023%