INDEX
    Explanations

    terms related to comparisons and alternatives

    New Auto-Interp
    Negative Logits
    fty
    -0.18
    ven
    -0.18
    koli
    -0.17
    ryn
    -0.17
    ray
    -0.16
    ilar
    -0.16
    rames
    -0.16
    swers
    -0.15
    rong
    -0.15
    onders
    -0.15
    POSITIVE LOGITS
    ewise
    0.36
    wis
    0.35
    world
    0.31
    wise
    0.30
     than
    0.30
     wise
    0.30
    -wise
    0.29
    -than
    0.27
    WISE
    0.26
    _than
    0.26
    Act Density 0.073%

    No Known Activations