INDEX
Explanations
connections or ties to specific entities or groups, often with a negative connotation
connections to controversial or nefarious groups and individuals
New Auto-Interp
Negative Logits
bra
-0.78
partName
-0.76
meal
-0.74
ipeg
-0.73
facing
-0.70
erion
-0.70
ItemImage
-0.69
reci
-0.69
hog
-0.69
iculty
-0.68
POSITIVE LOGITS
extremist
1.00
billionaire
0.89
extremism
0.84
militant
0.82
extremists
0.82
Kremlin
0.80
Osama
0.80
wealthy
0.80
terrorist
0.79
discredited
0.79
Activations Density 0.267%