INDEX
    Explanations

    words that convey authenticity and truthfulness or high value attributes

    New Auto-Interp
    Negative Logits
     separ
    -0.57
     aza
    -0.51
     Orient
    -0.49
     Parse
    -0.49
    piram
    -0.48
     jungle
    -0.47
     Osi
    -0.47
     pisa
    -0.47
     Christiane
    -0.47
     ali
    -0.46
    POSITIVE LOGITS
     viņ
    0.54
    berdayakan
    0.47
     vocês
    0.47
     graduación
    0.46
     lección
    0.45
     незавершена
    0.45
     lentejuelas
    0.44
     bermanfaat
    0.44
     pierna
    0.44
     Cheese
    0.43
    Act Density 0.227%

    No Known Activations