INDEX
    Explanations

    phrases expressing gratitude and recommendations

    New Auto-Interp
    Negative Logits
    zarchiwizowane
    -0.66
    Lähteet
    -0.64
     Interesting
    -0.60
    Interesting
    -0.59
     interest
    -0.59
    Iné
    -0.56
    ValueStyle
    -0.56
     derog
    -0.54
     interesting
    -0.54
    Interess
    -0.54
    POSITIVE LOGITS
     couldn
    0.82
     truly
    0.80
     hâte
    0.73
    truly
    0.72
     verkligen
    0.69
    couldn
    0.67
    Truly
    0.66
     Truly
    0.66
    本当
    0.65
     litté
    0.65
    Act Density 0.160%

    No Known Activations