INDEX
    Explanations

    phrases related to oversight and monitoring

    New Auto-Interp
    Negative Logits
    -0.60
     u
    -0.59
     ne
    -0.59
    \\
    -0.58
     ké
    -0.58
     z
    -0.58
     r
    -0.57
     ₹
    -0.56
     j
    -0.56
    -0.56
    POSITIVE LOGITS
     monitoring
    1.75
     monitored
    1.67
     monitors
    1.56
    monitoring
    1.53
     monitor
    1.53
     Monitoring
    1.50
     Monitors
    1.46
    itored
    1.43
     MONITORING
    1.41
     MONITOR
    1.40
    Act Density 0.115%

    No Known Activations