INDEX
Explanations
grammatical elements, specifically nouns, adjectives, pronouns, and verbs
references to parts of speech, particularly nouns and adjectives
New Auto-Interp
Negative Logits
ersen
-0.73
rared
-0.73
cler
-0.72
utherland
-0.69
ONSORED
-0.69
Nelson
-0.66
imilar
-0.65
psey
-0.65
Rider
-0.65
clerosis
-0.65
POSITIVE LOGITS
noun
1.03
adjective
0.96
ciation
0.96
pronouns
0.96
matical
0.92
suffix
0.90
adject
0.90
phrase
0.86
verbs
0.85
pronoun
0.83
Activations Density 0.015%