INDEX
Explanations
text related to specific sections or segments within a larger context
references to sections or divisions in a document or text
New Auto-Interp
Negative Logits
accompl
-0.72
score
-0.69
forests
-0.69
Whis
-0.69
clos
-0.69
placebo
-0.69
sat
-0.68
outl
-0.68
whis
-0.68
veto
-0.67
POSITIVE LOGITS
section
4.18
sections
3.48
sectional
1.81
sect
1.74
paragraph
1.67
Section
1.51
chapter
1.41
article
1.22
terms
1.18
slice
1.14
Activations Density 0.024%