INDEX
Explanations
references to humanoid life forms and their attributes
New Auto-Interp
Negative Logits
nit
-0.15
à¸Ļ
-0.14
-0.14
Hess
-0.14
↵ ↵
-0.13
itarian
-0.13
(er
-0.12
pter
-0.12
ت
-0.12
RTC
-0.12
POSITIVE LOGITS
ously
0.21
ecs
0.16
zhou
0.15
cies
0.15
-vous
0.14
oes
0.14
shire
0.14
ovna
0.14
IMS
0.14
ess
0.14
Activations Density 3.876%