INDEX
Explanations
references to religious or spiritual events and figures
New Auto-Interp
Negative Logits
ette
-0.18
idy
-0.16
obil
-0.15
polator
-0.15
èn
-0.15
еле
-0.15
efeller
-0.14
ане
-0.14
ernel
-0.14
obby
-0.14
POSITIVE LOGITS
uluk
0.15
verbatim
0.15
Esp
0.14
Asp
0.14
eldo
0.13
.centerX
0.13
ora
0.13
人åı£
0.13
IRTUAL
0.13
irtual
0.13
Activations Density 0.010%