INDEX
Explanations
inquiries and questions regarding philosophical or moral dilemmas
New Auto-Interp
Negative Logits
uttle
-0.16
umpt
-0.16
esian
-0.16
noop
-0.15
avers
-0.15
subt
-0.14
ALSE
-0.14
Zaman
-0.14
uman
-0.14
oard
-0.14
POSITIVE LOGITS
whether
0.24
how
0.22
why
0.22
-how
0.20
æĺ¯åIJ¦
0.20
How
0.20
Whether
0.19
Ø¢ÛĮا
0.19
æĺ¯åIJ¦
0.19
whether
0.19
Activations Density 0.069%