INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -2.96
    
    
    -1.05
    -0.98
    <?
    -0.98
    <?
    
    -0.81
    /**
    -0.81
    /*
    -0.74
    /***
    
    -0.71
    żdy
    -0.69
     rehabilitate
    -0.69
    POSITIVE LOGITS
     Smith
    1.53
    Smith
    1.47
     smith
    1.28
     SMITH
    1.28
    SMITH
    1.26
     Smiths
    1.21
    smith
    1.13
     thuy
    0.99
     Bulgar
    0.90
     Sén
    0.88
    Act Density 0.098%

    No Known Activations