INDEX
Explanations
phrases emphasizing taking initiative and responsibility
New Auto-Interp
Negative Logits
idth
-0.16
otherapy
-0.16
uju
-0.16
taj
-0.15
ве
-0.15
udge
-0.15
agogue
-0.15
legen
-0.14
شت
-0.14
nici
-0.14
POSITIVE LOGITS
advantage
0.39
seriously
0.32
risks
0.28
liberties
0.26
Seriously
0.24
responsibility
0.24
refuge
0.24
Advantage
0.23
steps
0.23
cues
0.23
Activations Density 0.292%