INDEX
    Explanations

    comparisons of similarity or equality

    comparisons emphasizing equality or similarity

    New Auto-Interp
    Negative Logits
    DIT
    -0.74
    MAP
    -0.72
    bryce
    -0.72
    POST
    -0.70
    mt
    -0.66
    Ry
    -0.65
    UL
    -0.64
    runs
    -0.64
    UCT
    -0.64
    ULE
    -0.63
    POSITIVE LOGITS
    pired
    0.80
    ptin
    0.77
     scrut
    0.76
     advertised
    0.75
     vain
    0.75
    iffe
    0.73
     eloqu
    0.70
    schild
    0.69
    itzer
    0.67
    iable
    0.66
    Act Density 0.036%

    No Known Activations