INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     We
    -2.58
    \
    -2.48
     a
    -2.42
     is
    -2.31
            
    -2.19
     need
    -2.14
    -2.13
     apolog
    -2.08
     no
    -2.06
     use
    -2.03
    POSITIVE LOGITS
     juſt
    2.64
    2.44
    ,  
    2.36
    2.34
     uſed
    2.33
     gadis
    2.28
     mawar
    2.28
     abſ
    2.27
     cristianos
    2.22
     ſeveral
    2.22
    Act Density 0.084%

    No Known Activations