INDEX
    Explanations

    phrases or terms associated with specific physical sensations or actions

    New Auto-Interp
    Negative Logits
     Anſ
    -0.91
     Theſe
    -0.85
     Reſ
    -0.85
    úsqueda
    -0.83
     myſelf
    -0.79
     Houſe
    -0.78
     Diſ
    -0.77
     pleaſure
    -0.75
     Eſ
    -0.75
     Efq
    -0.74
    POSITIVE LOGITS
     la
    0.70
     si
    0.64
     pe
    0.63
     care
    0.61
     de
    0.61
     sa
    0.56
     une
    0.50
     sau
    0.50
     mas
    0.49
     co
    0.49
    Act Density 0.029%

    No Known Activations