INDEX
Explanations
instances where an entity or organization is being described or referred to
references to historical or formal events and organizations
New Auto-Interp
Negative Logits
bugs
-0.86
whiff
-0.77
Trash
-0.77
bug
-0.75
selfies
-0.73
plaint
-0.73
smells
-0.72
smell
-0.70
prank
-0.70
redd
-0.67
POSITIVE LOGITS
encomp
1.05
concurrently
1.02
expanded
0.99
jointly
0.98
successor
0.98
unified
0.97
subdiv
0.96
predecessor
0.94
annex
0.93
merged
0.92
Activations Density 1.002%