INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Vacation
    -0.07
     مدينة
    -0.07
    Compet
    -0.07
    )",
    ↵
    -0.07
     Faction
    -0.07
     Examination
    -0.07
    precated
    -0.07
     Applications
    -0.07
     hospodář
    -0.07
     zach
    -0.07
    POSITIVE LOGITS
     spill
    0.12
     spilled
    0.11
     spills
    0.10
     Serialized
    0.07
     Feinstein
    0.07
    filled
    0.07
     leaked
    0.07
    IFO
    0.06
    _RB
    0.06
    blur
    0.06
    Act Density 0.002%

    No Known Activations