INDEX
Explanations
proper nouns, specifically related to politics, organizations, and individuals
specific names and titles associated with particular entities or groups
New Auto-Interp
Negative Logits
font
-0.74
arious
-0.68
sbm
-0.66
aturday
-0.63
BSD
-0.62
olulu
-0.62
!:
-0.62
Guinness
-0.60
NH
-0.60
;;;;
-0.60
POSITIVE LOGITS
cannot
0.91
hadn
0.88
forgot
0.85
withdrew
0.85
transitioned
0.84
itself
0.84
could
0.83
had
0.83
succeeded
0.82
would
0.82
Activations Density 0.637%