INDEX
Explanations
proper nouns and names of individuals
prominent figures and references to their actions or opinions
New Auto-Interp
Negative Logits
owship
-0.75
Also
-0.72
Secondly
-0.70
secondly
-0.69
ationally
-0.68
arching
-0.68
qqa
-0.67
Also
-0.64
furthermore
-0.63
actly
-0.62
POSITIVE LOGITS
concede
1.17
concedes
1.10
admits
1.05
admit
1.04
acknowledges
0.95
acknowledge
0.89
conceded
0.85
couldn
0.79
whiff
0.77
admitting
0.77
Activations Density 0.182%