INDEX
    Explanations

    pronouns and demonstrative articles

    New Auto-Interp
    Negative Logits
     Houſe
    -0.96
    wiſe
    -0.90
     Савезне
    -0.88
     houſe
    -0.88
    Controllo
    -0.87
     Anſ
    -0.84
     ―――――
    -0.84
     myſelf
    -0.83
     pleaſure
    -0.82
     Reſ
    -0.81
    POSITIVE LOGITS
     he
    0.96
     He
    0.95
     It
    0.93
     it
    0.87
     They
    0.85
    he
    0.84
     Det
    0.83
    Det
    0.80
     they
    0.80
    He
    0.76
    Act Density 0.038%

    No Known Activations