INDEX
    Explanations

    terms related to comparisons or similarities between different entities

    New Auto-Interp
    Negative Logits
     alkoh
    -1.11
     lele
    -0.96
     oner
    -0.94
     fta
    -0.93
     nece
    -0.92
     antik
    -0.91
     kac
    -0.89
     igno
    -0.89
     uhr
    -0.88
     ert
    -0.88
    POSITIVE LOGITS
    same
    0.77
     same
    0.77
     Same
    0.73
    Same
    0.68
    SAME
    0.64
     applies
    0.63
     الاطلاع
    0.54
     apply
    0.54
     SAME
    0.53
     Similarly
    0.53
    Act Density 0.118%

    No Known Activations