INDEX
    Explanations

    Explanations/deductions

    New Auto-Interp
    Negative Logits
    Impro
    -0.08
    -bal
    -0.08
    -Control
    -0.08
     theoretically
    -0.07
     Inspector
    -0.07
     ಹು
    -0.07
    ಗ್ಗ
    -0.07
    ෙන්
    -0.07
    igs
    -0.07
     Bala
    -0.07
    POSITIVE LOGITS
     specifically
    0.08
    वाही
    0.08
     hi
    0.08
     unspecified
    0.07
     contraception
    0.07
    отруд
    0.07
     loving
    0.07
    792
    0.07
     wrongdoing
    0.07
     supposedly
    0.07
    Act Density 0.173%

    No Known Activations