INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,
    -1.58
     flesta
    -0.78
     varandra
    -0.69
    "
    -0.66
    ',
    -0.65
    ",
    -0.63
     TextAppearance
    -0.62
     ddelweddau
    -0.61
    ،
    -0.61
     säll
    -0.60
    POSITIVE LOGITS
     but
    1.18
     and
    1.17
     which
    0.97
     as
    0.96
     although
    0.93
     or
    0.91
     including
    0.90
     with
    0.89
     if
    0.86
     along
    0.85
    Act Density 3.219%

    No Known Activations