INDEX
Explanations
proper nouns related to political figures, universities, and events
proper nouns, particularly names and locations
New Auto-Interp
Negative Logits
>>>>>>>>
-0.76
Portug
-0.71
Chains
-0.70
Debor
-0.67
NEWS
-0.62
Episcopal
-0.62
apocalypse
-0.61
afety
-0.61
hr
-0.59
Claus
-0.59
POSITIVE LOGITS
amel
0.91
alos
0.90
doms
0.86
jet
0.86
insky
0.85
aji
0.84
endor
0.83
arios
0.83
nit
0.81
deen
0.80
Activations Density 0.023%