INDEX
    Explanations

    qualities are described

    New Auto-Interp
    Negative Logits
     Ultimately
    0.44
     puissent
    0.38
     poiché
    0.38
    まれた
    0.38
     Primarily
    0.38
    Ultimately
    0.38
     אך
    0.37
     Apparently
    0.37
     પરંતુ
    0.37
     بتوان
    0.36
    POSITIVE LOGITS
     είναι
    0.96
     është
    0.68
     is
    0.68
    是非常
    0.65
     deserves
    0.65
     sucks
    0.64
    简直
    0.64
     feels
    0.63
     är
    0.62
    真的是
    0.61
    Act Density 0.025%

    No Known Activations