INDEX
Explanations
words related to controversial or sensitive topics
words related to specific characters or themes in storytelling
New Auto-Interp
Negative Logits
aign
-0.85
atform
-0.81
iated
-0.77
rowth
-0.76
iations
-0.75
iation
-0.75
roups
-0.75
resh
-0.74
igor
-0.73
agne
-0.73
POSITIVE LOGITS
ãĥĥãĥĪ
0.78
nuns
0.77
essee
0.75
paws
0.70
Pupp
0.69
Cly
0.67
chefs
0.67
eness
0.67
fix
0.66
culosis
0.64
Activations Density 0.025%