INDEX
Explanations
phrases related to ongoing criminal activities or controversies
New Auto-Interp
Negative Logits
!'
-0.82
!'"
-0.65
)'
-0.60
Fuck
-0.60
schild
-0.59
.'
-0.59
?'
-0.59
Illust
-0.56
.'"
-0.55
Writ
-0.54
POSITIVE LOGITS
"
0.94
"â̦
0.92
"...
0.89
"[
0.89
"'
0.84
''
0.77
"(
0.72
"#
0.69
counselling
0.64
misunderstood
0.64
Activations Density 1.082%