INDEX
Explanations
occurrences of the word "We" followed by varying contexts
instances of the pronoun "We."
New Auto-Interp
Negative Logits
quo
-0.75
gratification
-0.75
LSD
-0.74
cum
-0.64
reinforcement
-0.61
PUBLIC
-0.60
UD
-0.59
srfAttach
-0.58
guiActiveUnfocused
-0.57
rival
-0.57
POSITIVE LOGITS
bsite
1.08
ldon
1.08
bley
1.01
ighed
0.99
've
0.99
akening
0.98
alth
0.98
chwitz
0.98
're
0.97
asel
0.96
Activations Density 0.125%