INDEX
Explanations
proper nouns related to geography and politics
New Auto-Interp
Negative Logits
tops
-0.77
sequence
-0.75
fri
-0.66
yright
-0.64
dain
-0.63
othe
-0.61
iple
-0.60
landlords
-0.60
asso
-0.59
Alvin
-0.59
POSITIVE LOGITS
XT
0.82
Yard
0.74
2020
0.72
Pavilion
0.70
ophobic
0.69
Sunrise
0.68
Clause
0.67
ite
0.66
icz
0.66
Liberation
0.65
Activations Density 0.125%