INDEX
Explanations
people's names or proper nouns related to news articles or political events
New Auto-Interp
Negative Logits
antry
-0.60
heit
-0.57
uddin
-0.56
isine
-0.56
rology
-0.56
enne
-0.54
expression
-0.51
someone
-0.51
itivity
-0.51
ylum
-0.50
POSITIVE LOGITS
thirds
0.81
halves
0.79
paragraphs
0.70
generations
0.67
continents
0.65
sets
0.64
pairs
0.62
factions
0.61
pillars
0.60
brothers
0.60
Activations Density 12.091%