INDEX
Explanations
phrases indicating permission or invitation
phrases encouraging freedom and the invitation to share opinions
New Auto-Interp
Negative Logits
fam
-0.67
ELD
-0.65
ritical
-0.63
dated
-0.61
quickShipAvailable
-0.60
croft
-0.60
lehem
-0.59
assium
-0.59
iron
-0.59
Adin
-0.59
POSITIVE LOGITS
choose
1.14
roam
1.10
decide
1.10
improv
1.05
experiment
1.01
pursue
1.00
criticize
0.98
explore
0.98
disagree
0.96
modify
0.95
Activations Density 0.115%