INDEX
Explanations
mentions of things being imitated or imitating something else
words related to being "immoral" or "immorality."
New Auto-Interp
Negative Logits
OPLE
-0.68
Morales
-0.67
escription
-0.67
Downloadha
-0.66
Sack
-0.66
Rav
-0.65
NetMessage
-0.65
ttes
-0.63
Wales
-0.62
ij士
-0.61
POSITIVE LOGITS
itating
1.15
balanced
1.14
mer
1.12
manent
1.08
itates
1.08
itations
1.06
itated
1.05
mers
1.00
bal
0.98
bec
0.93
Activations Density 0.021%