INDEX
Explanations
mentions of people and their roles or actions within the context of narratives
New Auto-Interp
Negative Logits
embark
-0.15
626
-0.15
ameleon
-0.14
-0.14
UND
-0.13
↵↵
-0.13
ot
-0.13
-0.12
fen
-0.12
_fold
-0.12
POSITIVE LOGITS
serves
0.27
serve
0.25
heads
0.23
served
0.23
heads
0.21
serve
0.20
runs
0.20
head
0.19
works
0.19
spear
0.19
Activations Density 0.238%