INDEX
Explanations
phrases related to spreading information, specifically encouraging others to share and promote content
New Auto-Interp
Negative Logits
deen
-0.80
*/(
-0.73
Zup
-0.72
clamation
-0.69
herty
-0.66
--+
-0.65
tarians
-0.64
venge
-0.64
Starship
-0.62
udeau
-0.62
POSITIVE LOGITS
sheets
1.84
sheet
1.38
misinformation
0.93
shirt
0.88
disinformation
0.82
awareness
0.82
geographically
0.80
across
0.80
spreads
0.80
pread
0.78
Activations Density 0.036%