INDEX
Explanations
mentions of different groups or numbers of items in a list
key elements or metrics in contrasting situations
New Auto-Interp
Negative Logits
GOODMAN
-0.98
Christy
-0.70
PLUS
-0.69
Zionism
-0.67
Brigham
-0.64
Franz
-0.63
Regist
-0.62
Clarks
-0.62
Redemption
-0.62
Wrong
-0.61
POSITIVE LOGITS
others
0.80
Others
0.79
umber
0.74
empl
0.70
aughed
0.69
phthal
0.67
iliary
0.66
rest
0.65
twe
0.64
thro
0.64
Activations Density 0.218%