INDEX
    Explanations

    phrases indicating belonging or membership within a group

    New Auto-Interp
    Negative Logits
    oun
    -0.16
    .scalablytyped
    -0.15
    931
    -0.15
    osed
    -0.15
    obia
    -0.14
    hoff
    -0.14
    kır
    -0.14
    uler
    -0.14
    zas
    -0.14
    ünd
    -0.14
    POSITIVE LOGITS
     few
    0.20
     Few
    0.16
    ema
    0.15
    maal
    0.15
    strup
    0.15
     many
    0.15
    åĮ
    0.14
    apo
    0.14
    _many
    0.14
     fier
    0.14
    Act Density 0.120%

    No Known Activations