INDEX
Explanations
concepts related to objective truths and categorical frameworks
New Auto-Interp
Negative Logits
ìĿĦ
-0.20
ry
-0.20
ses
-0.19
Ìĥ
-0.19
ers
-0.19
ld
-0.19
maker
-0.19
soever
-0.19
liness
-0.18
ÑĩиÑĤ
-0.18
POSITIVE LOGITS
pants
0.17
nature
0.17
-minded
0.16
amente
0.16
yt
0.15
zza
0.15
-destruct
0.15
ament
0.15
zion
0.15
vely
0.15
Activations Density 0.150%