INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     occurring
    -0.09
     occurs
    -0.09
    (unsigned
    -0.08
    _allowed
    -0.08
    (MSG
    -0.08
     miracles
    -0.08
    (M
    -0.07
    _MON
    -0.07
     происходит
    -0.07
    (ID
    -0.07
    POSITIVE LOGITS
     hopefully
    0.13
    hopefully
    0.11
     Hopefully
    0.11
     unus
    0.10
     Finished
    0.10
     Thanks
    0.10
     exhausted
    0.09
     доволь
    0.09
    Thanks
    0.09
     siap
    0.09
    Act Density 0.179%

    No Known Activations