INDEX
Explanations
phrases starting with question words followed by a verb
pronouns that indicate a sense of community or collective action
New Auto-Interp
Negative Logits
ces
-0.87
76561
-0.80
ooks
-0.75
Par
-0.71
assembly
-0.71
assemb
-0.68
ories
-0.67
ching
-0.67
opens
-0.65
yrights
-0.65
POSITIVE LOGITS
succeed
1.04
be
0.95
accept
0.92
continue
0.92
concede
0.89
abandon
0.88
overwrite
0.87
tolerate
0.87
persist
0.86
differ
0.86
Activations Density 0.051%