INDEX
Explanations
references to cults and cult-related terms in various contexts
New Auto-Interp
Negative Logits
atch
-0.17
ctype
-0.16
amik
-0.16
lijke
-0.16
olen
-0.15
annes
-0.15
ening
-0.15
ilder
-0.14
avad
-0.14
OrCreate
-0.14
POSITIVE LOGITS
urally
0.33
ivating
0.31
ivate
0.31
ivated
0.29
ivation
0.26
URAL
0.25
ured
0.24
IVATE
0.23
ivar
0.23
uur
0.22
Activations Density 0.008%