INDEX
    Explanations

    phrases indicating similarity or comparison

    New Auto-Interp
    Negative Logits
    es
    -0.68
    dymyr
    -0.61
     sto
    -0.60
    cupa
    -0.60
    aphne
    -0.59
    ate
    -0.58
    o
    -0.57
     ste
    -0.57
     sphinct
    -0.56
    arbox
    -0.56
    POSITIVE LOGITS
     Similar
    1.27
    Similar
    1.25
     SIMILAR
    1.23
     similar
    1.22
    RectangleBorder
    1.21
    similar
    1.18
     nahilalakip
    1.12
    Похо
    1.10
    iliar
    1.08
     simil
    1.01
    Act Density 0.101%

    No Known Activations