INDEX
    Explanations

    phrases indicating qualifications or conditions related to actions or events

    instances of the token "<bos>"

    New Auto-Interp
    Negative Logits
     Houſe
    -0.51
     herself
    -0.47
     itſelf
    -0.46
     economico
    -0.46
    vieja
    -0.45
     Alva
    -0.44
    ąg
    -0.44
     Pergamon
    -0.44
     damska
    -0.44
    jalá
    -0.42
    POSITIVE LOGITS
     there
    1.56
     they
    1.33
     it
    1.22
    there
    1.20
     we
    1.06
     he
    0.97
     THERE
    0.96
     There
    0.90
    There
    0.87
     although
    0.85
    Act Density 0.139%

    No Known Activations