INDEX
    Explanations

    improvements and explanations

    New Auto-Interp
    Negative Logits
     fish
    0.80
     war
    0.74
     is
    0.73
     if
    0.73
     ur
    0.73
     k
    0.72
     one
    0.71
     i
    0.71
     I
    0.70
     people
    0.70
    POSITIVE LOGITS
     Verbesser
    1.08
    Improvements
    0.89
     mejoras
    0.83
     Improvements
    0.82
     улуч
    0.81
    improved
    0.81
     verbess
    0.80
     изменений
    0.80
     увеличения
    0.79
    improvements
    0.79
    Act Density 0.327%

    No Known Activations