INDEX
Explanations
references to individuals in authoritative or leadership roles
New Auto-Interp
Negative Logits
ambi
-0.18
.ua
-0.17
Ķ
-0.16
fl
-0.16
ilage
-0.15
ãĥ¼ãĥ©
-0.15
rating
-0.14
ãģªãĤĭ
-0.14
rug
-0.14
antage
-0.14
POSITIVE LOGITS
Linden
0.15
سÙĥ
0.15
opis
0.15
/generated
0.14
tember
0.14
berman
0.14
.decorate
0.14
лиÑı
0.13
backed
0.13
spoken
0.13
Activations Density 0.108%