INDEX
    Explanations

    phrases related to actions or emotional experiences

    New Auto-Interp
    Negative Logits
    .scalablytyped
    -0.17
    addir
    -0.17
     nues
    -0.16
     Ñģон
    -0.16
    porno
    -0.16
    ipsis
    -0.15
     Naked
    -0.15
    atoria
    -0.15
    orang
    -0.15
    kees
    -0.15
    POSITIVE LOGITS
    aving
    0.18
    ossa
    0.16
     Samar
    0.14
    WithContext
    0.14
    ansa
    0.14
    oon
    0.14
    gin
    0.13
    inne
    0.13
     Ala
    0.13
     illicit
    0.13
    Act Density 0.062%

    No Known Activations