INDEX
Explanations
references to chapters and their content in a text
New Auto-Interp
Negative Logits
consulté
-0.66
eri
-0.60
ose
-0.60
Zool
-0.56
öt
-0.55
ost
-0.55
kling
-0.53
I
-0.53
ass
-0.53
Bru
-0.52
POSITIVE LOGITS
chapters
1.59
chapter
1.54
Chapters
1.49
chapters
1.47
CHAPTER
1.46
CHAPTER
1.45
Chapter
1.43
Chapter
1.35
Chapters
1.34
chapter
1.34
Activations Density 0.052%