INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     CreateTagHelper
    -0.96
     EconPapers
    -0.81
     חיצוניים
    -0.80
    balanced
    -0.79
     виправивши
    -0.74
    balance
    -0.73
     balanced
    -0.72
     balance
    -0.71
    Balanced
    -0.70
     Balanced
    -0.68
    POSITIVE LOGITS
     by
    0.59
    __*/
    0.56
     in
    0.55
     so
    0.45
    dieu
    0.45
     for
    0.42
    ary
    0.41
    so
    0.41
    son
    0.40
    !*\
    0.40
    Act Density 0.003%

    No Known Activations