INDEX
Explanations
references to sexual content
mentions of sexual relationships or activities
New Auto-Interp
Negative Logits
BLIC
-0.76
ģĸ
-0.76
£ı
-0.73
Ĭ±
-0.72
retri
-0.69
REC
-0.69
¶ħ
-0.67
Lenn
-0.67
citiz
-0.66
EMP
-0.66
POSITIVE LOGITS
ido
0.95
ily
0.88
iest
0.86
ually
0.86
trafficking
0.85
bags
0.82
liest
0.82
addle
0.81
iness
0.79
bag
0.75
Activations Density 0.016%