INDEX
    Explanations

    names of people and their descriptions

    New Auto-Interp
    Negative Logits
    u
    0.77
    א
    0.73
    ,
    0.70
    다음
    0.66
    0.65
    in
    0.63
    and
    0.62
    f
    0.62
    ने
    0.61
    в
    0.61
    POSITIVE LOGITS
     
    1.02
     (
    0.63
    0.57
     carrito
    0.53
    0.52
     grumpy
    0.51
    illä
    0.49
     waged
    0.48
    0.48
    önet
    0.48
    Act Density 0.008%

    No Known Activations