INDEX
Explanations
references to cults
references to cults and cult-like groups
New Auto-Interp
Negative Logits
deen
-0.73
Turk
-0.70
forth
-0.66
cknowled
-0.65
LOAD
-0.64
Lakes
-0.63
Shay
-0.63
horn
-0.63
Jav
-0.62
lled
-0.62
POSITIVE LOGITS
ivating
1.05
ivation
1.01
ivated
0.96
cult
0.96
ists
0.95
etically
0.94
cult
0.93
etic
0.92
urally
0.92
ophon
0.91
Activations Density 0.013%