INDEX
    Explanations

    references to individuals and their actions, particularly focusing on pronouns and related verbs

    New Auto-Interp
    Negative Logits
     lato
    -0.47
    riction
    -0.47
    èvement
    -0.46
    AIRE
    -0.46
    ària
    -0.45
    ourites
    -0.45
    ‍♀️
    -0.45
    yf
    -0.44
    oupe
    -0.43
    ariato
    -0.43
    POSITIVE LOGITS
     theirs
    1.06
    Their
    0.85
     his
    0.83
     they
    0.83
    His
    0.81
     Their
    0.81
     hers
    0.80
    They
    0.80
     THEY
    0.79
     They
    0.78
    Act Density 0.373%

    No Known Activations