INDEX
Explanations
first-person singular pronouns followed by verbs indicating past actions
first-person pronouns and expressions of conditionality or hypothetical scenarios
New Auto-Interp
Negative Logits
srfAttach
-0.75
Reviewer
-0.72
Binding
-0.68
Whitman
-0.64
20439
-0.64
Leaks
-0.63
ãĥķãĤ©
-0.63
ãĥĥãĥī
-0.62
INGTON
-0.62
Balt
-0.59
POSITIVE LOGITS
'm
1.40
guess
1.02
verson
0.96
am
0.95
've
0.93
wish
0.85
zzo
0.84
assume
0.83
suppose
0.82
wanna
0.82
Activations Density 0.079%