INDEX
Explanations
references to boycotts and their implications
New Auto-Interp
Negative Logits
inet
-0.16
پست
-0.16
qus
-0.15
ibal
-0.15
kola
-0.15
claimer
-0.14
วà¸Ļ
-0.14
emet
-0.14
ibling
-0.14
ola
-0.13
POSITIVE LOGITS
Boy
0.47
boycott
0.41
div
0.35
boyc
0.34
BDS
0.30
Boy
0.28
бой
0.28
-bo
0.27
boy
0.26
bo
0.25
Activations Density 0.008%