INDEX
Explanations
personal pronouns followed by an action or description
first-person pronouns and assertions of personal experience
New Auto-Interp
Negative Logits
unavailable
-0.64
cknow
-0.63
considerably
-0.62
empt
-0.59
independently
-0.58
uncertain
-0.58
unable
-0.58
knowledge
-0.57
viol
-0.56
Xi
-0.56
POSITIVE LOGITS
meant
0.90
envisioned
0.89
hoped
0.84
preached
0.80
stri
0.78
boils
0.78
Wanted
0.77
intended
0.77
wanted
0.76
supposed
0.72
Activations Density 0.158%