INDEX
Explanations
proper nouns related to politics or geographical locations
proper nouns related to specific places and individuals
New Auto-Interp
Negative Logits
uration
-0.89
taboola
-0.86
raltar
-0.84
urable
-0.81
hops
-0.81
rar
-0.80
edin
-0.79
rams
-0.79
raped
-0.76
erness
-0.75
POSITIVE LOGITS
Leone
0.75
Veter
0.73
lings
0.70
pora
0.69
Sap
0.68
Viper
0.66
Cla
0.64
Mub
0.63
Yad
0.63
Reaction
0.61
Activations Density 0.050%