INDEX
Explanations
guide-like content offering assistance or information
phrases indicating guidance or instructions
New Auto-Interp
Negative Logits
boycot
-0.69
disliked
-0.69
liking
-0.69
cakes
-0.65
wearing
-0.65
ween
-0.65
kisses
-0.64
stealing
-0.64
Offline
-0.63
staged
-0.62
POSITIVE LOGITS
summarize
1.50
eluc
1.42
summar
1.36
explain
1.26
enlight
1.25
clarify
1.25
illustrate
1.25
outline
1.22
summarizes
1.21
illuminate
1.19
Activations Density 0.250%