INDEX
Explanations
words related to relationships or partnerships
references to companionship or romantic partnerships
New Auto-Interp
Negative Logits
uman
-0.73
uther
-0.73
icum
-0.71
athered
-0.67
ights
-0.66
english
-0.65
undred
-0.65
Syrian
-0.64
ERROR
-0.64
eye
-0.64
POSITIVE LOGITS
mate
1.35
Mate
1.09
mates
1.08
mate
0.79
daq
0.74
volent
0.73
Amir
0.73
Nieto
0.72
||||
0.71
Markus
0.70
Activations Density 0.006%