INDEX
Explanations
content related to religious themes or mentions
New Auto-Interp
Negative Logits
piece
-0.16
\Active
-0.16
-piece
-0.15
ialized
-0.15
Uran
-0.14
stanov
-0.14
ssc
-0.13
finity
-0.13
assage
-0.13
Ïģιά
-0.13
POSITIVE LOGITS
ones
0.18
ly
0.17
LY
0.16
vido
0.15
IPS
0.15
EH
0.15
489
0.15
auga
0.14
ÏĦεÏħ
0.14
ONES
0.14
Activations Density 0.008%