INDEX
Explanations
references to anthropological concepts and figures
New Auto-Interp
Negative Logits
rush
-0.17
vida
-0.15
utra
-0.14
ouz
-0.14
جÙĦ
-0.14
ãģŁãĤī
-0.14
merc
-0.14
ounge
-0.14
fsp
-0.14
ensch
-0.13
POSITIVE LOGITS
cket
0.16
ald
0.15
nist
0.15
istle
0.15
Hin
0.15
ORY
0.15
abe
0.14
viên
0.14
antagon
0.14
gunakan
0.14
Activations Density 0.181%