INDEX
Explanations
references to philosophical concepts and discussions
New Auto-Interp
Negative Logits
illet
-0.15
èĶ
-0.15
ris
-0.15
.blit
-0.14
alent
-0.14
縮
-0.14
dash
-0.14
ity
-0.14
ownik
-0.14
rysler
-0.14
POSITIVE LOGITS
osoph
0.23
ippi
0.21
оÑģоÑĦ
0.20
phil
0.19
osopher
0.19
ical
0.17
á»ģn
0.16
y
0.15
Soph
0.15
Phil
0.15
Activations Density 0.014%