INDEX
Explanations
references to religious or cultural figures and practices
New Auto-Interp
Negative Logits
Eden
-0.16
elden
-0.15
Toolkit
-0.14
pike
-0.14
Monsters
-0.14
.tencent
-0.13
жÑĥ
-0.13
ismet
-0.13
Pie
-0.13
UCT
-0.13
POSITIVE LOGITS
Ling
0.27
ling
0.26
temple
0.26
Lord
0.25
devote
0.24
Lord
0.24
poo
0.23
lord
0.22
temples
0.22
Temp
0.21
Activations Density 0.184%