INDEX
Explanations
references to romantic relationships and dating
New Auto-Interp
Negative Logits
AssemblyCompany
-0.42
enfance
-0.39
tričko
-0.35
vriende
-0.34
organizada
-0.32
ftagPool
-0.31
leão
-0.31
hijas
-0.31
erschutz
-0.31
padat
-0.31
POSITIVE LOGITS
romantic
1.52
romance
1.29
couples
1.28
Couples
1.27
Romantic
1.22
Romantic
1.21
romantic
1.19
romantique
1.13
Couple
1.10
couple
1.09
Activations Density 0.455%