INDEX
Explanations
mentions of specific entities (people, locations, organizations) and their associated actions or characteristics in news articles or reports
New Auto-Interp
Negative Logits
ceive
-0.60
imi
-0.56
ample
-0.55
ijn
-0.53
icia
-0.53
strate
-0.53
âĢIJ
-0.52
thood
-0.51
inel
-0.51
arate
-0.51
POSITIVE LOGITS
latter
0.93
latest
0.91
strongest
0.85
easiest
0.84
simplest
0.84
biggest
0.83
entire
0.83
toughest
0.81
vast
0.80
same
0.79
Activations Density 8.096%