INDEX
Explanations
words related to political and religious figures
words and phrases related to individuals associated with extremist groups
New Auto-Interp
Negative Logits
Titanic
-0.81
Chloe
-0.76
gears
-0.75
ponies
-0.74
Takeru
-0.73
erella
-0.73
Velvet
-0.73
roller
-0.72
pitchers
-0.71
butterflies
-0.70
POSITIVE LOGITS
aq
1.43
awi
1.38
qqa
1.32
qa
1.23
ayn
1.20
iyah
1.17
Allah
1.13
azi
1.12
abi
1.11
Islam
1.10
Activations Density 0.267%