INDEX
    Explanations

    references to similarity or sameness

    New Auto-Interp
    Negative Logits
    rawler
    -0.14
    aro
    -0.13
    asures
    -0.13
    ilation
    -0.12
    ืà¸Ń
    -0.12
    ILA
    -0.12
     Lap
    -0.12
    izable
    -0.12
     finally
    -0.12
    orna
    -0.12
    POSITIVE LOGITS
     same
    0.84
    same
    0.78
    Same
    0.67
     Same
    0.65
     SAME
    0.62
    åIJĮ
    0.60
    _same
    0.59
    SAME
    0.59
    缸åIJĮ
    0.57
     mismo
    0.56
    Act Density 0.140%

    No Known Activations