INDEX
Explanations
religious or philosophical concepts and terms
references to various doctrines and their significance
New Auto-Interp
Negative Logits
ilee
-0.72
ells
-0.72
bors
-0.64
-0.63
arters
-0.63
APE
-0.62
oÄŁ
-0.62
vals
-0.62
ilant
-0.61
reen
-0.60
POSITIVE LOGITS
doctrine
1.09
doctrines
0.92
Doctrine
0.92
istries
0.83
ieu
0.80
arium
0.78
utical
0.78
ologies
0.74
ually
0.73
orthodoxy
0.71
Activations Density 0.022%