INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ainted
    -0.80
    advertising
    -0.74
    shire
    -0.70
    EP
    -0.67
    EVA
    -0.66
    este
    -0.65
    mberg
    -0.65
    hr
    -0.65
    ité
    -0.64
    Ward
    -0.63
    POSITIVE LOGITS
     than
    1.90
     Than
    1.65
    than
    1.61
     versions
    0.90
     "$:/
    0.86
    iating
    0.82
     ado
    0.73
     Faster
    0.73
     behaved
    0.72
    versions
    0.68
    Act Density 0.179%

    No Known Activations