INDEX
    Explanations

    phrases indicating knowledge or information

    the phrase "that" in various contexts

    New Auto-Interp
    Negative Logits
    orah
    -0.79
    hens
    -0.73
    ãĥ¼ãĤ¯
    -0.72
    obb
    -0.71
    amia
    -0.66
    orian
    -0.65
    tails
    -0.64
    ield
    -0.64
    apolis
    -0.62
    estern
    -0.61
    POSITIVE LOGITS
     pesky
    1.02
    cher
    0.76
     they
    0.75
     there
    0.75
     although
    0.71
    same
    0.70
     fateful
    0.70
     whereas
    0.70
     someday
    0.68
     THEY
    0.67
    Act Density 0.203%

    No Known Activations