INDEX
    Explanations

    comparative phrases and terms related to performance or quality

    New Auto-Interp
    Negative Logits
    efs
    -0.20
    .effects
    -0.18
    änger
    -0.15
    аÑĤки
    -0.15
    bih
    -0.15
    achen
    -0.14
    reff
    -0.14
    _GF
    -0.14
    ellung
    -0.14
     кÑĥлÑĮ
    -0.14
    POSITIVE LOGITS
     fare
    0.32
     Fare
    0.28
     performance
    0.26
     fares
    0.25
     better
    0.24
     performances
    0.24
    fare
    0.23
     well
    0.23
     worse
    0.22
    performance
    0.22
    Act Density 0.093%

    No Known Activations