INDEX
Explanations
phrases related to doing things correctly or in the right way
phrases related to moral or ethical decision-making
New Auto-Interp
Negative Logits
urated
-0.71
raltar
-0.69
arsen
-0.68
lodged
-0.67
icia
-0.65
vanquished
-0.62
angered
-0.61
edia
-0.61
hner
-0.61
settled
-0.59
POSITIVE LOGITS
thing
1.24
chores
1.16
things
1.06
stunts
1.05
tasks
1.03
job
1.00
homework
0.99
flips
0.96
calculations
0.96
thing
0.94
Activations Density 0.242%