INDEX
    Explanations

    references to pronouns and demonstrating connections between the characters and their actions

    pronouns followed by prepositions

    New Auto-Interp
    Negative Logits
     when
    -0.42
     with
    -0.42
     using
    -0.42
     at
    -0.42
     whose
    -0.39
     a
    -0.39
     of
    -0.38
     the
    -0.38
     in
    -0.36
     by
    -0.36
    POSITIVE LOGITS
    majánló
    0.99
     queſta
    0.98
    <unused3>
    0.98
    <unused79>
    0.97
    <unused16>
    0.97
    <unused8>
    0.97
    <unused17>
    0.97
    <unused23>
    0.97
    <unused14>
    0.97
    [@BOS@]
    0.97
    Act Density 0.040%

    No Known Activations