INDEX
Explanations
references to chapters in a document
New Auto-Interp
Negative Logits
öt
-0.67
ose
-0.63
ost
-0.61
Mule
-0.60
ot
-0.60
kling
-0.59
Zool
-0.59
ưa
-0.58
Sarko
-0.57
os
-0.57
POSITIVE LOGITS
chapters
1.82
chapter
1.76
chapters
1.69
Chapters
1.66
CHAPTER
1.62
Chapter
1.62
CHAPTER
1.59
Chapter
1.55
chapter
1.54
Chapters
1.49
Activations Density 0.103%