INDEX
Explanations
references to the word "her" and its variations
New Auto-Interp
Negative Logits
Monfieur
-0.87
raiſ
-0.86
Houſe
-0.82
uſe
-0.78
Оно
-0.77
purpoſe
-0.74
Efq
-0.74
cauſe
-0.73
pleaſure
-0.72
uſed
-0.70
POSITIVE LOGITS
her
3.01
her
2.01
Her
1.89
HER
1.89
Her
1.84
his
1.68
她的
1.62
hennes
1.57
HER
1.53
hers
1.50
Activations Density 0.056%