INDEX
Explanations
proper nouns, potentially related to news articles or other text comprising of a mixture of letters and numbers
specific acronyms or abbreviations often related to organizations or government entities
New Auto-Interp
Negative Logits
hol
-0.91
fe
-0.82
iors
-0.81
itant
-0.74
aign
-0.74
omore
-0.74
ite
-0.74
gard
-0.73
hor
-0.73
auga
-0.73
POSITIVE LOGITS
IRO
1.66
IMAGES
1.30
IRED
1.29
ION
1.28
ITAL
1.27
ORN
1.25
ELY
1.20
ECT
1.19
ATOR
1.19
IS
1.18
Activations Density 0.026%