INDEX
Explanations
phrases indicating a suggestion or command for action
the phrase "better" in various contexts indicating advice or recommendations
New Auto-Interp
Negative Logits
eur
-0.73
Nob
-0.72
esville
-0.72
etic
-0.70
untarily
-0.67
urous
-0.66
oidal
-0.62
éĥ
-0.61
MG
-0.61
itory
-0.61
POSITIVE LOGITS
behaved
0.80
suited
0.78
luck
0.76
ment
0.75
manners
0.72
beware
0.72
than
0.71
idge
0.70
ments
0.69
lett
0.66
Activations Density 0.040%