INDEX
Explanations
references to famous personalities and entities such as politicians, celebrities, and sports figures in news articles
New Auto-Interp
Negative Logits
mbuds
-0.74
ipedia
-0.73
noon
-0.71
fo
-0.64
Helpful
-0.63
veyard
-0.61
Crimean
-0.60
english
-0.60
Duration
-0.60
Females
-0.58
POSITIVE LOGITS
opted
0.90
underwent
0.87
joked
0.85
survived
0.84
admits
0.83
tweeted
0.82
insists
0.80
penned
0.79
endured
0.78
wore
0.78
Activations Density 0.168%