INDEX
Explanations
issues related to personal rights and freedoms
New Auto-Interp
Negative Logits
FUCK
-0.16
Ïĩη
-0.16
fuck
-0.15
fucking
-0.15
shitty
-0.15
Fuck
-0.15
sorts
-0.15
sort
-0.15
Sorted
-0.15
ÑĩаÑĤ
-0.14
POSITIVE LOGITS
um
0.23
uh
0.23
--
0.22
--
0.22
ya
0.18
sir
0.17
-
0.17
(
0.16
{}0.16
--,
0.15
Activations Density 0.075%