INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     proud
    -0.09
    .used
    -0.08
     clay
    -0.08
    -0.08
    -0.08
    736
    -0.07
    -0.07
     aspiring
    -0.07
    -0.07
     వీ
    -0.07
    POSITIVE LOGITS
    worthiness
    0.14
    worthy
    0.11
    0.09
    fulness
    0.09
    ably
    0.09
     Horses
    0.08
     kokoa
    0.08
    edit
    0.08
    ful
    0.08
    -benar
    0.08
    Act Density 0.014%

    No Known Activations