INDEX
    Explanations

    phrases indicating similarity or comparisons

    New Auto-Interp
    Negative Logits
    onso
    -0.56
    es
    -0.56
    !("{
    -0.52
     Nase
    -0.50
     Græ
    -0.50
    lección
    -0.50
    arbox
    -0.50
     ulang
    -0.49
    といけない
    -0.49
    assigns
    -0.49
    POSITIVE LOGITS
     nahilalakip
    1.07
     similar
    1.06
     SIMILAR
    1.04
    RectangleBorder
    1.02
    Similar
    1.01
     Similar
    1.01
    similar
    1.00
    Похо
    0.97
    iliar
    0.92
     simil
    0.92
    Act Density 0.175%

    No Known Activations