INDEX
Explanations
phrases indicating a list of items or examples
phrases that introduce or list examples and key ideas
New Auto-Interp
Negative Logits
avery
-0.89
endor
-0.88
etz
-0.87
ulhu
-0.85
iolet
-0.80
uca
-0.79
undle
-0.78
amphetamine
-0.75
byss
-0.75
culosis
-0.74
POSITIVE LOGITS
examples
1.45
reasons
1.30
highlights
1.27
excerpts
1.23
noteworthy
1.16
notable
1.15
salient
1.15
observations
1.14
ways
1.12
facts
1.12
Activations Density 0.102%