INDEX
    Explanations

    accuracy and correctness

    New Auto-Interp
    Negative Logits
     literalmente
    0.47
     pronouns
    0.44
     preoccupation
    0.44
     mnoho
    0.43
     agak
    0.42
     uprisings
    0.41
     elaboration
    0.40
     political
    0.40
     carpeting
    0.40
     inventions
    0.39
    POSITIVE LOGITS
     accurate
    0.83
     оптима
    0.83
    正确
    0.82
     সঠিক
    0.79
    正確
    0.77
     adequately
    0.77
    accurate
    0.77
     optimale
    0.75
     correct
    0.75
     correctly
    0.75
    Act Density 0.752%

    No Known Activations