INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    la
    -0.07
    -0.07
    -0.07
     Idol
    -0.07
    سط
    -0.07
     dolor
    -0.07
    -translate
    -0.07
    IDAD
    -0.07
     statusBar
    -0.07
    -0.07
    POSITIVE LOGITS
     keep
    0.18
     keeping
    0.14
     kept
    0.14
     keeps
    0.13
     Keep
    0.12
     KEEP
    0.12
    keep
    0.12
    Keep
    0.11
     Keeps
    0.10
    Keeping
    0.10
    Act Density 0.037%

    No Known Activations