INDEX
Explanations
phrases related to ethical considerations, societal issues, and political implications
concepts related to health and well-being in various contexts
New Auto-Interp
Negative Logits
uscript
-0.83
arger
-0.70
mouth
-0.69
aunder
-0.68
pton
-0.67
enium
-0.67
prus
-0.67
Jelly
-0.66
dra
-0.65
oop
-0.64
POSITIVE LOGITS
motivating
1.03
motiv
1.01
priority
0.99
paramount
0.95
trump
0.94
badge
0.94
criterion
0.94
disqual
0.93
precedence
0.92
motivation
0.91
Activations Density 0.888%