INDEX
    Explanations

    phrases that indicate a comparison or evaluation of success

    New Auto-Interp
    Negative Logits
    olec
    -0.15
    arkin
    -0.15
    utow
    -0.15
     Remaining
    -0.14
    urai
    -0.14
    ConverterFactory
    -0.14
    atto
    -0.14
    aviour
    -0.14
    <count
    -0.13
    AME
    -0.13
    POSITIVE LOGITS
     worse
    0.20
     improvement
    0.18
     improve
    0.17
     mejorar
    0.17
     improves
    0.17
     Worse
    0.16
     Improve
    0.16
     melhor
    0.15
    orsche
    0.15
     improvements
    0.15
    Act Density 0.116%

    No Known Activations