INDEX
    Explanations

    Nationalities/Groups

    New Auto-Interp
    Negative Logits
     itſelf
    -1.53
     myſelf
    -1.44
     purpoſe
    -1.42
     Houſe
    -1.41
     houſe
    -1.40
     Anſ
    -1.38
     pleaſure
    -1.37
     Monfieur
    -1.36
     ſtate
    -1.31
     reaſon
    -1.30
    POSITIVE LOGITS
    ,
    0.73
    .
    0.71
     to
    0.71
     in
    0.69
     direct
    0.69
     (
    0.67
    0.67
     from
    0.66
    <eos>
    0.65
     on
    0.62
    Act Density 0.177%

    No Known Activations