INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     \
    -2.25
     blatantly
    -2.14
    There
    -2.13
     When
    -2.03
     Despite
    -2.02
     spurred
    -1.99
    ;
    
    -1.98
    }
    -1.97
     subtly
    -1.95
            
    -1.93
    POSITIVE LOGITS
     semelh
    2.38
    2.23
    2.11
    2.11
    2.11
    2.09
     tajem
    2.09
     perigo
    2.06
    2.06
    2.05
    Act Density 0.003%

    No Known Activations