INDEX
    Explanations

    phrases related to physical descriptions or conditions, especially ones related to clothing or appearance

    New Auto-Interp
    Negative Logits
    Reviewer
    -0.75
     Hasan
    -0.64
    rers
    -0.62
    vation
    -0.61
     indirectly
    -0.59
    ãĥ¼ãĥĨ
    -0.58
     heads
    -0.58
     Hamm
    -0.56
     Scotia
    -0.55
     exit
    -0.54
    POSITIVE LOGITS
    poke
    1.42
    iege
    1.33
    erker
    1.24
    pect
    1.13
    earchers
    1.01
    peak
    0.95
    aved
    0.95
    oin
    0.94
    erk
    0.93
    hirt
    0.90
    Act Density 0.033%

    No Known Activations