INDEX
Explanations
expressions of encouragement and support for actions or behaviors
New Auto-Interp
Negative Logits
id
-0.76
lands
-0.75
("")]
-0.75
ber
-0.65
as
-0.63
}{|-0.63
off
-0.62
io
-0.62
queline
-0.62
land
-0.61
POSITIVE LOGITS
encouraged
2.04
encourage
2.01
encourages
2.00
Encourage
1.97
encouragement
1.92
Encourage
1.87
couraged
1.82
encouraging
1.74
couraging
1.73
encouragement
1.73
Activations Density 0.096%