INDEX
Explanations
words related to specific religious text references, physical places, and proper nouns
specific symbols or terms associated with a structured format or pattern in the text
New Auto-Interp
Negative Logits
ierrez
-0.96
eln
-0.76
arnaev
-0.75
ablishment
-0.71
Muller
-0.70
osit
-0.70
osta
-0.69
oration
-0.69
oleon
-0.68
uncture
-0.68
POSITIVE LOGITS
Ø
1.38
¹
1.14
Ù
1.05
اÙĦ
0.99
Ùħ
0.99
Ù
0.98
اØ
0.98
©
0.96
¨
0.96
Ø
0.93
Activations Density 0.010%