INDEX
Explanations
personal pronouns and verbs indicating speaking, writing, or explaining from a first-person perspective
New Auto-Interp
Negative Logits
noon
-0.73
acters
-0.69
estial
-0.63
ictional
-0.61
rocket
-0.60
selection
-0.60
iencies
-0.60
torch
-0.59
paralle
-0.59
cloning
-0.58
POSITIVE LOGITS
said
1.47
said
1.25
wrote
1.23
says
1.23
exclaimed
1.21
explained
1.14
Said
1.13
replied
1.13
joked
1.10
told
1.09
Activations Density 0.836%