INDEX
Explanations
references to romantic partners
references to romantic partners
New Auto-Interp
Negative Logits
cale
-0.80
aston
-0.79
anism
-0.74
ihil
-0.71
ouses
-0.69
iard
-0.66
ceilings
-0.64
aji
-0.64
thur
-0.63
Paris
-0.63
POSITIVE LOGITS
partner
1.17
partners
1.00
Partner
0.85
ãĤ´ãĥ³
0.76
competitor
0.65
hood
0.64
colleague
0.64
loo
0.63
Karin
0.63
itute
0.62
Activations Density 0.012%