INDEX
Explanations
phrases related to oversight and monitoring
New Auto-Interp
Negative Logits
-0.60
u
-0.59
ne
-0.59
\\
-0.58
ké
-0.58
z
-0.58
r
-0.57
₹
-0.56
j
-0.56
•
-0.56
POSITIVE LOGITS
monitoring
1.75
monitored
1.67
monitors
1.56
monitoring
1.53
monitor
1.53
Monitoring
1.50
Monitors
1.46
itored
1.43
MONITORING
1.41
MONITOR
1.40
Activations Density 0.115%