INDEX
    Explanations

    phrases introducing examples or comparisons

    New Auto-Interp
    Negative Logits
    Carriera
    -0.62
     båda
    -0.54
     XNUMX
    -0.53
    AndEndTag
    -0.52
    UnusedPrivate
    -0.52
     Damit
    -0.50
    Damit
    -0.50
     Jefus
    -0.50
     ſame
    -0.49
     ſeveral
    -0.49
    POSITIVE LOGITS
     those
    1.28
    those
    1.03
    :
    0.89
     ours
    0.87
     namely
    0.86
    גון
    0.84
     Those
    0.84
    namely
    0.81
    เช่น
    0.81
     celles
    0.80
    Act Density 0.418%

    No Known Activations