INDEX
Explanations
references to an individual's career trajectory and accomplishments
New Auto-Interp
Negative Logits
poke
-0.15
zes
-0.15
itarian
-0.15
illon
-0.15
ategory
-0.14
ples
-0.14
rics
-0.14
dera
-0.13
ανά
-0.13
(animated
-0.13
POSITIVE LOGITS
-long
0.18
-ending
0.17
boyunca
0.17
-threatening
0.17
AGO
0.16
-span
0.15
552
0.14
ago
0.14
RelativeTo
0.14
spent
0.14
Activations Density 0.061%