INDEX
Explanations
phrases in the format of "______ argues _____" with a focus on success and emotional wellbeing
the presence of a specific symbol or character repeated throughout the text
New Auto-Interp
Negative Logits
oxide
-0.71
welf
-0.70
adulthood
-0.65
estranged
-0.65
dock
-0.64
charger
-0.64
antip
-0.64
accomp
-0.63
pped
-0.62
ignition
-0.62
POSITIVE LOGITS
WHERE
1.11
yet
1.05
wait
1.04
there
1.02
they
1.01
why
1.00
BUT
0.97
among
0.97
until
0.96
sort
0.95
Activations Density 0.046%