INDEX
Explanations
words related to romantic sentiments or relationships
references to romantic themes and relationships
New Auto-Interp
Negative Logits
upon
-0.94
avis
-0.86
ldon
-0.81
Downloadha
-0.77
redd
-0.73
WIN
-0.73
irin
-0.73
ulhu
-0.72
uckles
-0.72
aston
-0.72
POSITIVE LOGITS
romantic
0.98
ized
0.88
istic
0.86
monog
0.86
istically
0.82
ties
0.79
antically
0.77
ization
0.77
izing
0.75
romance
0.71
Activations Density 0.007%