INDEX
Explanations
references to chapters and numerical ratios
New Auto-Interp
Negative Logits
tunggal
-0.61
horabuena
-0.50
glú
-0.48
verkau
-0.47
alimentaria
-0.47
mutlu
-0.47
muerta
-0.47
destroyAll
-0.46
poffible
-0.46
ArrowToggle
-0.46
POSITIVE LOGITS
effect
0.67
chapter
0.65
Chapters
0.64
chase
0.61
Chapter
0.59
mercy
0.57
chap
0.57
Chapters
0.56
Chap
0.56
genre
0.55
Activations Density 0.298%