INDEX
    Explanations

    comparisons or similarities

    phrases that express similarity or comparisons

    New Auto-Interp
    Negative Logits
    igion
    -0.83
    inion
    -0.83
    alez
    -0.76
    Language
    -0.76
    abases
    -0.74
    IAL
    -0.74
    ourse
    -0.74
    chin
    -0.72
     helicop
    -0.72
    aft
    -0.69
    POSITIVE LOGITS
    lier
    0.92
    liest
    0.88
    lihood
    0.82
     fir
    0.67
     crap
    0.66
     fun
    0.65
     filler
    0.64
     fireworks
    0.64
     pus
    0.63
     peas
    0.63
    Act Density 0.023%

    No Known Activations