INDEX
    Explanations

    positive descriptors related to experiences

    physical states and actions

    New Auto-Interp
    Negative Logits
     Dasar
    -0.30
     verdaderas
    -0.29
     silêncio
    -0.28
     zarar
    -0.27
     medlemmer
    -0.27
     davran
    -0.26
     verdaderos
    -0.26
     sagesse
    -0.26
     goutte
    -0.26
     förs
    -0.25
    POSITIVE LOGITS
    linawan
    0.68
    AndEndTag
    0.65
    niſſe
    0.65
     queſto
    0.65
    ftagPool
    0.62
    0.61
     ſehen
    0.60
     geſch
    0.60
    0.60
     ſeinen
    0.60
    Act Density 0.021%

    No Known Activations