INDEX
Explanations
adjectives related to love or romantic relationships
references to romantic themes and relationships
New Auto-Interp
Negative Logits
avis
-0.93
upon
-0.89
Downloadha
-0.88
ividual
-0.80
ulhu
-0.75
ridges
-0.73
ktop
-0.73
paio
-0.72
ldon
-0.72
*/(
-0.70
POSITIVE LOGITS
ized
0.90
ization
0.85
romantic
0.82
comed
0.82
istic
0.81
izing
0.80
istically
0.79
ties
0.78
comedy
0.76
monog
0.75
Activations Density 0.021%