INDEX
    Explanations

    phrases related to accountability and consequences for actions

    New Auto-Interp
    Negative Logits
     purpoſe
    -0.86
     Theſe
    -0.83
     myſelf
    -0.78
     Monfieur
    -0.75
     Inſ
    -0.74
     Diſ
    -0.74
     pleaſure
    -0.73
     ſtate
    -0.72
     iſt
    -0.70
    rungsseite
    -0.70
    POSITIVE LOGITS
     its
    0.60
     @"/
    0.59
     their
    0.55
     своей
    0.55
     having
    0.54
     suoi
    0.51
     vì
    0.49
    ésia
    0.49
    0.48
    née
    0.48
    Act Density 0.289%

    No Known Activations