INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     دولة
    -0.08
    putation
    -0.08
    ikipedia
    -0.08
     erzielt
    -0.08
    entscheid
    -0.07
     argued
    -0.07
     נמ
    -0.07
     الماضي
    -0.07
    观点
    -0.07
    -0.07
    POSITIVE LOGITS
    .monitor
    0.13
     monitor
    0.12
    Monitoring
    0.12
     monitoring
    0.12
     Monitoring
    0.11
     vigilant
    0.11
    0.11
     Monitor
    0.11
    0.11
    Monitor
    0.11
    Act Density 0.010%

    No Known Activations