INDEX
Explanations
phrases related to negative behaviors and consequences, such as drug abuse and failure
keywords related to social and political issues
New Auto-Interp
Negative Logits
SPONSORED
-0.81
etheless
-0.67
EMBER
-0.64
EMP
-0.62
)]
-0.62
enter
-0.61
zik
-0.61
]]
-0.61
Recommend
-0.60
largeDownload
-0.60
POSITIVE LOGITS
notwithstanding
0.85
etc
0.80
everywhere
0.74
!),
0.72
gal
0.71
abound
0.70
!,
0.68
or
0.67
buzzing
0.65
mattered
0.65
Activations Density 1.238%