INDEX
Explanations
references to historical or geographical locations
New Auto-Interp
Negative Logits
avez
-0.16
zure
-0.16
zan
-0.16
zman
-0.15
@nate
-0.15
léd
-0.15
ãĥ¼ãĥĦ
-0.15
ائب
-0.14
vak
-0.14
zee
-0.14
POSITIVE LOGITS
aph
0.31
azy
0.31
ushed
0.29
allowed
0.26
ith
0.24
irs
0.23
istr
0.22
asty
0.22
ards
0.22
odge
0.22
Activations Density 0.012%