INDEX
Explanations
enforce rules and maintain order
New Auto-Interp
Negative Logits
supervise
-0.12
superv
-0.11
supervised
-0.11
meas
-0.10
grav
-0.10
restraining
-0.09
fortified
-0.09
lawy
-0.09
å½»
-0.09
solicit
-0.09
POSITIVE LOGITS
仲
0.17
med
0.16
åĨ³å®ļ
0.15
決å®ļ
0.15
arbit
0.15
decisions
0.14
decision
0.14
decide
0.14
police
0.13
decides
0.13
Activations Density 0.120%