INDEX
    Explanations

    improved model or version

    New Auto-Interp
    Negative Logits
    0.42
     επισ
    0.39
     madness
    0.39
     climbing
    0.39
     dangerous
    0.38
     sirven
    0.38
     hazardous
    0.38
     embezz
    0.38
     گزار
    0.37
     للخ
    0.37
    POSITIVE LOGITS
    behavior
    0.43
    extensions
    0.40
    model
    0.40
    UTCTime
    0.38
     MODEL
    0.38
    Behavior
    0.37
    管制
    0.37
    ijk
    0.37
     Behavior
    0.37
     libert
    0.36
    Act Density 0.000%

    No Known Activations