INDEX
Explanations
references to a particular individual's attributes or actions
New Auto-Interp
Negative Logits
Monfieur
-0.68
נטרנט
-0.61
Eul
-0.60
Agamemnon
-0.60
Jegyzetek
-0.59
allez
-0.59
Allez
-0.59
تبد
-0.58
Dede
-0.58
ally
-0.58
POSITIVE LOGITS
his
1.68
HIS
1.51
her
1.45
HIS
1.43
His
1.42
His
1.30
his
1.28
own
1.25
Her
1.23
Her
1.08
Activations Density 0.147%