INDEX
Explanations
instances of the word "pretend" followed by an action
instances of the word "pretend" and its derivatives
New Auto-Interp
Negative Logits
âĨij
-0.71
OTOS
-0.67
vez
-0.67
ccording
-0.64
Bio
-0.63
States
-0.61
interrupted
-0.60
APH
-0.59
Ger
-0.58
ilings
-0.58
POSITIVE LOGITS
pretend
0.82
querade
0.81
arial
0.81
pas
0.80
entious
0.76
pretending
0.75
ãĥ¼ãĥĨãĤ£
0.75
cha
0.74
pretended
0.74
forgot
0.71
Activations Density 0.010%