INDEX
    Explanations

    questions that begin with "Are"

    New Auto-Interp
    Negative Logits
    664
    -0.16
    lore
    -0.16
    eur
    -0.15
    ills
    -0.15
    rzy
    -0.15
    orra
    -0.15
    pery
    -0.14
    ruc
    -0.14
    ré
    -0.14
    poons
    -0.14
    POSITIVE LOGITS
    zzo
    0.25
    ospace
    0.25
    nda
    0.23
    nds
    0.22
    ady
    0.21
    tha
    0.20
    obic
    0.18
    nts
    0.18
    psilon
    0.18
    tap
    0.17
    Act Density 0.033%

    No Known Activations