INDEX
    Explanations

    phrases expressing personal preferences and likes

    New Auto-Interp
    Negative Logits
    acco
    -0.18
    line
    -0.17
    iste
    -0.17
    ista
    -0.16
    sel
    -0.16
    ils
    -0.16
    /Linux
    -0.16
    shed
    -0.15
     slightest
    -0.15
    tes
    -0.15
    POSITIVE LOGITS
    -minded
    0.25
    /dis
    0.24
    able
    0.21
     minded
    0.20
    /lo
    0.20
    WISE
    0.17
     unto
    0.17
     latter
    0.17
    elihood
    0.16
    ably
    0.16
    Act Density 0.080%

    No Known Activations