INDEX
Explanations
chapter numbers
references to specific chapters in texts
New Auto-Interp
Negative Logits
zzle
-0.75
oggles
-0.73
yg
-0.70
pes
-0.69
berman
-0.68
blaster
-0.66
enez
-0.66
fitting
-0.66
fitt
-0.65
ydia
-0.65
POSITIVE LOGITS
ĸļ
0.95
chapter
0.92
chapters
0.84
apeake
0.83
chapter
0.81
hound
0.76
Chapter
0.75
icularly
0.75
Chapters
0.73
CHAPTER
0.72
Activations Density 0.015%