INDEX
Explanations
references to historical or biblical figures and their actions or claims
New Auto-Interp
Negative Logits
enha
-0.16
zin
-0.15
orre
-0.15
jom
-0.14
лиÑĤ
-0.14
omal
-0.14
ariat
-0.14
ipur
-0.14
veled
-0.14
orias
-0.13
POSITIVE LOGITS
sdk
0.17
γον
0.16
ipse
0.15
igy
0.14
taxing
0.14
dsl
0.14
dash
0.14
اطر
0.13
ngine
0.13
OCK
0.13
Activations Density 0.141%