INDEX
Explanations
terms and phrases related to the concept of "cult."
New Auto-Interp
Negative Logits
ity
-0.16
atch
-0.16
ening
-0.15
ieties
-0.15
olen
-0.15
amik
-0.15
annes
-0.14
kir
-0.14
lijke
-0.14
atan
-0.14
POSITIVE LOGITS
urally
0.33
ivating
0.30
ured
0.30
ura
0.28
ivate
0.28
ures
0.27
ivated
0.27
URAL
0.26
urer
0.25
urable
0.24
Activations Density 0.011%