INDEX
Explanations
phrases or sentences starting with "We"
repeated mentions of the word "we."
New Auto-Interp
Negative Logits
LSD
-0.66
totality
-0.65
UD
-0.64
quo
-0.63
bloc
-0.62
cum
-0.61
gratification
-0.60
ECO
-0.58
srfAttach
-0.58
PUBLIC
-0.58
POSITIVE LOGITS
akening
1.24
ldon
1.21
bley
1.21
've
1.18
ird
1.16
eks
1.12
ighed
1.12
alth
1.12
eping
1.11
're
1.10
Activations Density 0.140%