INDEX
Explanations
phrases indicating importance, value, or focus
phrases introducing significant concepts or statements
New Auto-Interp
Negative Logits
robe
-0.64
say
-0.63
aciously
-0.63
order
-0.62
ahime
-0.62
hip
-0.62
mission
-0.61
pointer
-0.61
erton
-0.61
due
-0.59
POSITIVE LOGITS
bothers
1.32
distinguishes
1.23
happens
1.23
separates
1.20
happened
1.16
mattered
1.13
bothered
1.09
hurts
1.00
pops
1.00
sticks
1.00
Activations Density 0.137%