INDEX
Explanations
terms related to romantic relationships and couple activities
romantic date, couples, honeymoon, love partner
New Auto-Interp
Negative Logits
AssemblyCompany
-0.85
AssemblyCulture
-0.81
<unused42>
-0.81
<unused23>
-0.81
<unused79>
-0.81
<unused41>
-0.80
<unused43>
-0.80
<unused16>
-0.80
<pad>
-0.80
<unused8>
-0.80
POSITIVE LOGITS
romantic
0.76
honeymoon
0.60
Romantic
0.60
couples
0.56
Romantic
0.52
romantis
0.50
romantic
0.48
Honeymoon
0.48
Couples
0.47
romance
0.45
Activations Density 0.009%