INDEX
Explanations
phrases introducing questions or prompts to engage the reader
the word "have" in various contexts, indicating questions or statements about possession or experiences
New Auto-Interp
Negative Logits
etter
-0.63
uit
-0.61
arter
-0.59
etting
-0.59
peak
-0.58
apo
-0.57
hesis
-0.56
dark
-0.55
stration
-0.55
deception
-0.55
POSITIVE LOGITS
been
1.24
been
1.14
Been
1.06
undergone
0.93
gotten
0.92
gotten
0.88
kell
0.84
begun
0.77
asts
0.77
entertained
0.76
Activations Density 0.165%