INDEX
    Explanations

    words and phrases related to roles and classifications within various groups

    New Auto-Interp
    Negative Logits
     cherchés
    -0.71
    were
    -0.64
    Were
    -0.58
     gdyby
    -0.57
    Has
    -0.55
    extAlignment
    -0.55
     Has
    -0.54
     Were
    -0.53
     theyre
    -0.52
    سمبر
    -0.51
    POSITIVE LOGITS
     love
    0.92
     often
    0.88
     rarely
    0.81
     tend
    0.81
     seldom
    0.76
     spend
    0.75
     typically
    0.75
     learn
    0.71
     shouldn
    0.69
     LOVE
    0.69
    Act Density 0.497%

    No Known Activations