INDEX
Explanations
references to religious figures and teachings
New Auto-Interp
Negative Logits
âĢª
-0.16
famously
-0.15
Fuck
-0.15
ainen
-0.15
seperate
-0.15
canonical
-0.14
ipur
-0.14
shit
-0.14
fuck
-0.14
ucceed
-0.14
POSITIVE LOGITS
Ỽ
0.15
pione
0.15
ÑģÑĦ
0.15
Äįe
0.15
µľ
0.14
ajor
0.14
ifestyles
0.14
reten
0.13
ÙĨÙĬÙĨ
0.13
çı
0.13
Activations Density 0.003%