INDEX
Explanations
people's names
the presence of the term "Womb."
New Auto-Interp
Negative Logits
SEA
-0.76
âĸ¬
-0.75
ãģ®éŃĶ
-0.72
AMS
-0.71
FANTASY
-0.69
çĭ
-0.68
BOOK
-0.68
PLAY
-0.67
âĸ¬âĸ¬
-0.66
×ķ
-0.64
POSITIVE LOGITS
omb
1.18
ilib
1.01
odies
0.91
icz
0.88
ombs
0.87
inis
0.87
ont
0.84
uds
0.84
ody
0.84
edded
0.82
Activations Density 0.009%