INDEX
    Explanations

    comparisons and contrasts between concepts

    New Auto-Interp
    Negative Logits
    .BLL
    -0.15
    earn
    -0.15
    acco
    -0.14
     Ñģло
    -0.14
     faux
    -0.14
    ulle
    -0.14
     nors
    -0.14
    Ïīνα
    -0.13
    arn
    -0.13
     silver
    -0.13
    POSITIVE LOGITS
    --)
    0.16
    .parameter
    0.15
    593
    0.14
    gger
    0.14
    ugin
    0.14
    atego
    0.14
    еÑĢалÑĮ
    0.14
    ابت
    0.14
    jt
    0.14
    Ñģов
    0.13
    Act Density 0.155%

    No Known Activations