INDEX
    Explanations

    contrasts or differences between concepts or ideas

    New Auto-Interp
    Negative Logits
     lele
    -1.06
     mef
    -0.94
     fei
    -0.90
     fta
    -0.90
     meis
    -0.90
     myn
    -0.88
     afo
    -0.87
     paff
    -0.87
     wien
    -0.86
     lara
    -0.86
    POSITIVE LOGITS
     merely
    0.67
     nevertheless
    0.60
    而是
    0.59
     simply
    0.58
     rather
    0.57
     nonetheless
    0.56
     instead
    0.56
    upné
    0.53
     actually
    0.53
     yet
    0.52
    Act Density 0.115%

    No Known Activations