INDEX
    Explanations

    references to male and female pronouns, indicating discussions about specific individuals

    pronoun followed by verb

    New Auto-Interp
    Negative Logits
    featureID
    -0.72
    ロウィン
    -0.65
    IntoConstraints
    -0.60
     müſſen
    -0.58
     <=",
    -0.57
     メンテナ
    -0.56
     ddelwed
    -0.56
     Weiſe
    -0.56
    <unused14>
    -0.56
    <unused41>
    -0.56
    POSITIVE LOGITS
    UnusedPrivate
    0.43
    enderror
    0.40
    Tembelea
    0.35
    tamment
    0.32
    AddTagHelper
    0.32
    flashdata
    0.32
     Nachfolger
    0.32
     tartalomajánló
    0.32
     nakalista
    0.30
    Predecesor
    0.28
    Act Density 0.009%

    No Known Activations