INDEX
Explanations
proper nouns like names of people and organizations, especially related to politics or business
New Auto-Interp
Negative Logits
rier
-0.75
holder
-0.69
ivities
-0.69
selves
-0.68
rap
-0.66
ishing
-0.66
atche
-0.66
ancies
-0.65
olk
-0.62
etheless
-0.62
POSITIVE LOGITS
ppo
1.17
ffic
0.99
ctl
0.97
cean
0.96
zzi
0.95
active
0.94
ÄŁ
0.93
hazard
0.91
zza
0.90
pec
0.88
Activations Density 0.033%