INDEX
    Explanations

    phrases that express purpose or intent

    New Auto-Interp
    Negative Logits
     Jefus
    -0.90
     Diſ
    -0.79
     Anſ
    -0.73
     Theſe
    -0.73
     himſelf
    -0.71
     pleaſure
    -0.71
     Conſ
    -0.71
     uſe
    -0.69
     Efq
    -0.68
     Cæsar
    -0.68
    POSITIVE LOGITS
     nakalista
    0.67
     order
    0.66
     inorder
    0.66
     Afin
    0.65
     afin
    0.63
     better
    0.62
     avoiding
    0.58
    帖最后由
    0.57
     enabling
    0.57
     чтобы
    0.57
    Act Density 0.060%

    No Known Activations