INDEX
Explanations
phrases related to oversight and control
references to the concept of supervision
New Auto-Interp
Negative Logits
outp
-0.72
Scythe
-0.69
Rush
-0.69
ifles
-0.67
cients
-0.66
ieft
-0.66
erker
-0.66
HRC
-0.66
itiveness
-0.66
egg
-0.66
POSITIVE LOGITS
supervised
3.46
supervision
2.51
superv
1.58
vised
1.39
Sud
1.28
monitored
1.17
overseen
1.15
vard
1.11
curfew
1.09
lihood
0.96
Activations Density 0.051%