INDEX
Explanations
words related to professions or roles, such as developer, candidate, person, worker, insider, investor, politician, journalist, customer, and consumer
references to specific roles or identities of individuals
New Auto-Interp
Negative Logits
irens
-0.85
ernels
-0.78
metadata
-0.72
ories
-0.67
acements
-0.67
ummies
-0.66
atars
-0.65
Horses
-0.65
escription
-0.64
clusions
-0.64
POSITIVE LOGITS
learns
1.31
chooses
1.31
knows
1.16
decides
1.10
spends
1.09
wants
1.07
understands
1.07
behaves
1.06
earns
1.05
enjoys
1.05
Activations Density 0.181%