INDEX
    Explanations

    strong expressions of moral and ethical guidelines

    New Auto-Interp
    Negative Logits
    GOTREF
    -0.57
    colourful
    -0.50
    PerformLayout
    -0.49
    Cuántos
    -0.49
    łada
    -0.48
    Already
    -0.48
     colourful
    -0.48
     sparsely
    -0.48
    書館
    -0.47
     neus
    -0.47
    POSITIVE LOGITS
     absolute
    0.86
     absolutely
    0.86
     ABSOL
    0.85
     assolutamente
    0.84
    absolutely
    0.82
     absoluto
    0.82
    ABSOL
    0.81
     absolut
    0.80
     Absolutely
    0.79
    Absolutely
    0.75
    Act Density 0.266%

    No Known Activations