INDEX
Explanations
phrases related to supervision and oversight
New Auto-Interp
Negative Logits
orians
-0.78
orian
-0.75
hner
-0.72
endor
-0.71
tery
-0.69
blank
-0.69
ml
-0.68
growth
-0.68
UES
-0.68
hon
-0.67
POSITIVE LOGITS
supervision
1.33
supervised
1.10
superv
0.99
probation
0.92
oversee
0.89
confir
0.86
conduc
0.85
Instruct
0.76
overseen
0.74
overseeing
0.71
Activations Density 0.011%