INDEX
Explanations
proper nouns and technical terms related to current events and policies
New Auto-Interp
Negative Logits
itialized
-0.66
bler
-0.66
erate
-0.65
ership
-0.63
ggle
-0.63
gib
-0.62
gery
-0.61
Control
-0.61
ospel
-0.60
perpetual
-0.60
POSITIVE LOGITS
afternoon
0.90
night
0.85
evening
0.78
announcing
0.78
days
0.76
flower
0.76
unveiling
0.75
morning
0.71
ghan
0.71
marked
0.70
Activations Density 0.073%