INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
    -0.07
     EITHER
    -0.07
    مخاطر
    -0.07
    -0.07
    emade
    -0.07
    -0.07
     rentals
    -0.07
    _news
    -0.06
    ibraltar
    -0.06
    POSITIVE LOGITS
    щик
    0.08
    权威
    0.07
     great
    0.07
    BE
    0.07
     awards
    0.06
    .Interface
    0.06
     respect
    0.06
                         
    0.06
     monarch
    0.06
    ']]
    0.06
    Act Density 0.000%

    No Known Activations