INDEX
Explanations
action words suggesting consideration or specific tasks
phrases that suggest recommendations or advice
New Auto-Interp
Negative Logits
indle
-0.70
DX
-0.66
ynthesis
-0.65
ille
-0.63
idy
-0.63
ilian
-0.62
opl
-0.61
ophone
-0.60
VID
-0.58
ophobia
-0.57
POSITIVE LOGITS
consider
0.82
reconsider
0.75
EStream
0.73
advis
0.72
rethink
0.71
ij士
0.69
abl
0.69
gotten
0.68
)=(
0.68
vised
0.67
Activations Density 0.080%