INDEX
Explanations
verbs indicating continuation or extension of a previous thought or action
verbs indicating ongoing or repetitive actions
New Auto-Interp
Negative Logits
transformative
-0.62
helps
-0.56
harms
-0.53
Workers
-0.53
aples
-0.51
beware
-0.51
Gifts
-0.50
chrome
-0.50
stand
-0.50
hires
-0.49
POSITIVE LOGITS
rhet
0.76
Laughs
0.73
_.
0.68
ONSORED
0.66
laugh
0.65
Rh
0.64
quoting
0.64
diplom
0.64
laughs
0.63
sarcast
0.63
Activations Density 0.120%