INDEX
    Explanations

    comparisons or similarities between different concepts

    phrases that indicate similarity comparisons

    New Auto-Interp
    Negative Logits
     stoked
    -0.69
    tun
    -0.67
    raq
    -0.67
     danced
    -0.63
    gered
    -0.63
    resy
    -0.62
     helicop
    -0.61
    uve
    -0.60
     contrace
    -0.60
     transitioned
    -0.60
    POSITIVE LOGITS
     ours
    0.84
    lihood
    0.78
    oxide
    0.75
    ffee
    0.71
    rium
    0.69
    èª
    0.67
     theirs
    0.65
     those
    0.62
     the
    0.62
    traditional
    0.62
    Act Density 0.174%

    No Known Activations