INDEX
Explanations
phrases related to controversial or sensitive topics such as incest
references to incestuous relationships
New Auto-Interp
Negative Logits
eah
-0.91
¥µ
-0.86
ĪĴ
-0.84
yers
-0.82
Ĵ
-0.81
gd
-0.78
gments
-0.77
inion
-0.77
mers
-0.76
wer
-0.75
POSITIVE LOGITS
Frankenstein
1.11
Kafka
1.02
Dracula
0.98
incest
0.90
Malfoy
0.89
Cullen
0.88
vampires
0.85
Franz
0.85
Cassandra
0.83
ãĤ¼ãĤ¦ãĤ¹
0.81
Activations Density 0.027%