INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Houſe
    -1.11
     Monfieur
    -1.10
     Jefus
    -1.09
     Anſ
    -1.09
    CloseOperation
    -1.08
    AndEndTag
    -1.08
     autorytatywna
    -1.07
     RIPRODUZIONE
    -1.07
     Theſe
    -1.07
     myſelf
    -1.07
    POSITIVE LOGITS
     es
    0.61
    ,
    0.59
    0.59
     to
    0.57
    0.57
     state
    0.55
    like
    0.53
     ro
    0.52
    .
    0.52
     (
    0.51
    Act Density 0.336%

    No Known Activations