INDEX
Explanations
nouns related to politics and government
New Auto-Interp
Negative Logits
rompt
-0.82
heid
-0.65
Mellon
-0.65
alid
-0.64
Chains
-0.63
vest
-0.61
moil
-0.61
CARD
-0.60
icer
-0.60
edin
-0.58
POSITIVE LOGITS
importantly
1.32
afa
0.99
notably
0.98
mornings
0.90
likely
0.83
likely
0.82
body
0.81
egreg
0.80
noticeable
0.77
observers
0.75
Activations Density 0.262%