INDEX
    Explanations

    names of people and places, particularly in a political context

    New Auto-Interp
    Negative Logits
     myſelf
    -0.98
     pleaſure
    -0.93
     ſtate
    -0.92
    ſelf
    -0.92
     itſelf
    -0.90
     houſe
    -0.90
     faſt
    -0.88
     Efq
    -0.86
     raiſ
    -0.86
     purpoſe
    -0.85
    POSITIVE LOGITS
    aarrggbb
    0.58
    ,
    0.51
     u
    0.48
     m
    0.48
    /
    0.47
     wa
    0.47
    ?
    0.47
     i
    0.46
     y
    0.46
     im
    0.46
    Act Density 0.016%

    No Known Activations