INDEX
Explanations
instances of granting access, permissions, or responsibilities
New Auto-Interp
Negative Logits
tep
-0.16
ime
-0.16
arse
-0.16
cht
-0.15
Ãłn
-0.15
etz
-0.15
cob
-0.14
åĭĴ
-0.14
oga
-0.14
å¾Ĺ
-0.14
POSITIVE LOGITS
tasks
0.24
opportunity
0.21
task
0.20
instructions
0.19
Tasks
0.19
ä»»åĬ¡
0.19
tasks
0.18
assignment
0.18
Tasks
0.18
freedom
0.18
Activations Density 0.098%