INDEX
    Explanations

    positive descriptions of actions or qualities

    New Auto-Interp
    Negative Logits
    <bos>
    -3.45
     intersper
    -2.18
     encomp
    -1.99
     shenan
    -1.76
     inconce
    -1.75
     reluct
    -1.71
     unspeak
    -1.71
     hairc
    -1.70
     indestru
    -1.70
     impra
    -1.65
    POSITIVE LOGITS
     asfal
    1.05
     torba
    0.99
     utop
    0.99
     tyn
    0.99
     ortop
    0.99
     sement
    0.97
     ananas
    0.96
     sonda
    0.95
     balon
    0.95
     benzin
    0.94
    Act Density 2.013%

    No Known Activations