INDEX
    Explanations

    pronoun followed by verb

    New Auto-Interp
    Negative Logits
    ction
    0.57
    ção
    0.52
     assim
    0.52
    𝐏
    0.52
    etary
    0.50
     sostiene
    0.50
    𝐀
    0.49
    cción
    0.49
    𝐍
    0.49
    pped
    0.49
    POSITIVE LOGITS
    ר
    0.79
    т
    0.73
     Lordships
    0.66
    tis
    0.66
    self
    0.65
    d
    0.65
    ח
    0.63
    0.59
    k
    0.58
    0.58
    Act Density 0.210%

    No Known Activations