INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     better
    -0.84
     more
    -0.84
     some
    -0.82
     fortsetter
    -0.81
     reportedly
    -0.80
    some
    -0.79
     alcuni
    -0.79
    olverine
    -0.78
    
    -0.78
     above
    -0.77
    POSITIVE LOGITS
     Gln
    1.01
     ,\
    0.99
     ,
    
    0.94
    ткий
    0.93
    0.92
     )
    
    0.91
     Faça
    0.91
     ,
    0.90
     Daarnaast
    0.88
    >;
    0.88
    Act Density 0.038%

    No Known Activations