INDEX
Explanations
names of individuals and associated activities in contexts involving human rights
New Auto-Interp
Negative Logits
NX
-0.88
Californ
-0.83
FANTASY
-0.82
Titanic
-0.82
Uncharted
-0.82
Interstellar
-0.81
conventions
-0.76
ponies
-0.76
revolving
-0.75
Jurassic
-0.74
POSITIVE LOGITS
llah
1.33
aq
1.29
Hasan
1.27
azi
1.23
ullah
1.21
jad
1.20
hammad
1.20
rahim
1.16
Muhammad
1.14
Hassan
1.13
Activations Density 0.092%