INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    oÄį
    -0.18
     unlike
    -0.15
    ãĤ¡
    -0.15
    mdir
    -0.14
    antom
    -0.13
    ogn
    -0.13
    evin
    -0.13
    annis
    -0.13
    utin
    -0.13
    974
    -0.13
    POSITIVE LOGITS
     same
    1.10
    same
    1.02
    Same
    0.93
     Same
    0.91
     SAME
    0.82
    åIJĮ
    0.79
    _same
    0.77
     mismo
    0.75
     mesma
    0.71
    缸åIJĮ
    0.71
    Act Density 0.470%

    No Known Activations