INDEX
    Explanations

    punctuation marks and special characters

    New Auto-Interp
    Negative Logits
     Fé
    -0.82
     West
    -0.77
     Ade
    -0.76
     Gund
    -0.73
    West
    -0.71
    Ade
    -0.69
    "+
    
    -0.69
     Amé
    -0.67
     Hilde
    -0.67
     trin
    -0.66
    POSITIVE LOGITS
    ″]
    1.28
    }]
    1.23
    _]
    1.14
    "]
    1.12
    \"]
    1.09
    })]
    1.07
    rfloor
    1.07
    ]]
    1.06
     "]
    1.04
     ]
    1.02
    Act Density 0.236%

    No Known Activations