INDEX
Explanations
words related to romantic relationships
references to romantic themes and relationships
New Auto-Interp
Negative Logits
upon
-0.89
avis
-0.81
ldon
-0.81
aston
-0.77
Reviewer
-0.77
uckles
-0.75
irin
-0.72
aver
-0.71
ifted
-0.71
avers
-0.70
POSITIVE LOGITS
romantic
1.15
monog
0.94
romance
0.84
antically
0.83
Romance
0.80
Romantic
0.77
ized
0.77
erotic
0.76
ties
0.75
consensual
0.73
Activations Density 0.006%