INDEX
Explanations
references to collective actions or statements involving the word "we."
Sentences starting with "We"
we followed by verb
New Auto-Interp
Negative Logits
-0.51
ter
-0.50
perla
-0.49
pomo
-0.47
frapp
-0.45
valer
-0.45
ordering
-0.44
objet
-0.43
kof
-0.43
ly
-0.43
POSITIVE LOGITS
have
0.94
'):
0.93
don
0.92
can
0.91
would
0.90
never
0.89
still
0.87
didn
0.86
had
0.85
wouldn
0.84
Activations Density 0.182%