INDEX
Explanations
objects related to physical injuries or attacks
New Auto-Interp
Negative Logits
urches
-0.84
nces
-0.80
Parties
-0.80
rencies
-0.80
acron
-0.79
Governments
-0.79
portfolios
-0.78
sites
-0.78
publications
-0.77
embassies
-0.77
POSITIVE LOGITS
icum
0.86
assian
0.78
load
0.78
few
0.77
ogram
0.76
asket
0.76
imeter
0.74
plane
0.73
oscope
0.73
hole
0.73
Activations Density 0.488%