INDEX
Explanations
political and news-related terms and entities
references to media and visual elements
New Auto-Interp
Negative Logits
soType
-0.77
DERR
-0.77
fired
-0.73
soDeliveryDate
-0.70
ocr
-0.67
mingham
-0.64
edited
-0.62
tested
-0.62
ample
-0.62
stated
-0.61
POSITIVE LOGITS
]'
0.76
Photos
0.73
Signs
0.71
aval
0.69
Mysterious
0.68
htaking
0.68
:'
0.68
Deadly
0.65
resil
0.65
Photos
0.64
Activations Density 0.068%