INDEX
    Explanations

    phrases that express comparisons and similarities between experiences or concepts

    New Auto-Interp
    Negative Logits
    mability
    -0.67
    Джерела
    -0.59
    明明
    -0.57
     fubject
    -0.57
     purpoſe
    -0.57
     Silla
    -0.56
    Enllaços
    -0.55
     diſt
    -0.55
    tetten
    -0.55
    losis
    -0.55
    POSITIVE LOGITS
     like
    0.76
    enumii
    0.72
     Like
    0.65
    like
    0.64
    Like
    0.63
     kuten
    0.59
     seperti
    0.58
     giống
    0.57
     LIKE
    0.57
     kuin
    0.57
    Act Density 0.520%

    No Known Activations