INDEX
Explanations
words related to romantic relationships
concepts related to romance and romantic relationships
New Auto-Interp
Negative Logits
WTC
-0.85
ensor
-0.75
Concent
-0.69
Stack
-0.69
Blocks
-0.68
Fork
-0.68
Container
-0.67
Glob
-0.67
ACC
-0.66
Shap
-0.65
POSITIVE LOGITS
romance
3.49
romantic
2.40
rom
2.38
Romance
1.94
Romantic
1.75
flirt
1.62
erotic
1.44
friendship
1.38
dating
1.36
monog
1.35
Activations Density 0.031%