INDEX
Explanations
words related to things that are prohibited or restricted
references to social taboos and restrictions
New Auto-Interp
Negative Logits
teness
-0.99
atche
-0.81
owing
-0.80
Kear
-0.79
kinson
-0.77
annis
-0.77
HCR
-0.74
代
-0.74
yrim
-0.73
Downloadha
-0.72
POSITIVE LOGITS
taboo
1.26
fetish
0.81
weap
0.68
reth
0.65
Myth
0.64
disse
0.61
scrimmage
0.61
decree
0.60
genital
0.60
Belt
0.60
Activations Density 0.008%