INDEX
Explanations
references to specific characters and settings in a narrative context
New Auto-Interp
Negative Logits
лÑİÑĩ
-0.18
arius
-0.17
iven
-0.16
athe
-0.15
lix
-0.14
eed
-0.14
ivec
-0.14
urvey
-0.14
leading
-0.14
ilst
-0.14
POSITIVE LOGITS
()._
0.15
é«
0.14
à¹Ģส
0.14
Mad
0.13
lac
0.13
ÙĪÙĦÙĬ
0.13
ân
0.13
ÑĨей
0.13
electrom
0.13
ëĭ
0.13
Activations Density 0.171%