INDEX
    Explanations

    comparisons using the word "like" to describe similarities between different things

    New Auto-Interp
    Negative Logits
     unspeak
    -0.56
     indescri
    -0.53
     cuck
    -0.51
     beaut
    -0.51
     beaute
    -0.51
    kaç
    -0.50
     cushi
    -0.50
     Medea
    -0.49
    dateOfBirth
    -0.47
    czegóły
    -0.47
    POSITIVE LOGITS
     like
    0.80
     LIKE
    0.75
    LIKE
    0.73
     nagu
    0.73
     affez
    0.71
    like
    0.70
     trover
    0.70
     mosso
    0.67
     jät
    0.67
     preghi
    0.66
    Act Density 0.111%

    No Known Activations