INDEX
    Explanations

    occurrences of the word "were."

    New Auto-Interp
    Negative Logits
    ĥ½
    -2.00
     Caption
    -1.75
    ²
    -1.73
    gs
    -1.64
    ¹
    -1.63
     hers
    -1.53
     labels
    -1.50
    ¿½
    -1.46
    wear
    -1.41
    ¥
    -1.38
    POSITIVE LOGITS
    eer
    1.60
    cht
    1.58
    afen
    1.54
    ophe
    1.53
    isco
    1.53
    aul
    1.52
    FPar
    1.48
    riton
    1.40
    olin
    1.40
    opan
    1.39
    Act Density 0.108%

    No Known Activations