INDEX
Explanations
proper nouns, specifically names of individuals and positions related to political contexts
New Auto-Interp
Negative Logits
Rebellion
-0.74
Rebels
-0.73
Wonderland
-0.69
Cerberus
-0.68
Leviathan
-0.68
theaters
-0.66
robbers
-0.65
Colleges
-0.65
womb
-0.64
machines
-0.64
POSITIVE LOGITS
ergus
1.01
imir
0.96
frey
0.96
ileen
0.95
jit
0.95
ureen
0.94
iji
0.93
anya
0.93
resa
0.93
jan
0.91
Activations Density 0.174%