INDEX
Explanations
references to religious figures and their historical significance
New Auto-Interp
Negative Logits
ì§ľ
-0.16
opat
-0.16
isure
-0.15
зÑĭ
-0.15
aryawan
-0.15
emente
-0.14
achs
-0.14
illance
-0.14
ÑĢаÑħ
-0.14
urat
-0.14
POSITIVE LOGITS
worship
0.45
cult
0.40
wor
0.38
worsh
0.36
Worship
0.36
idol
0.35
wor
0.35
idols
0.31
Wor
0.30
Idol
0.26
Activations Density 0.189%