INDEX
    Explanations

    instances of the word "similar" as a descriptor or comparison

    New Auto-Interp
    Negative Logits
    een
    -0.17
    yp
    -0.17
    vÃŃ
    -0.16
    eln
    -0.16
    essa
    -0.15
    printStats
    -0.15
    eer
    -0.15
    eter
    -0.14
    ngr
    -0.14
     hete
    -0.14
    POSITIVE LOGITS
    -minded
    0.23
    ily
    0.22
    mente
    0.21
    -sex
    0.20
    teenth
    0.18
    weise
    0.18
    inded
    0.17
     minded
    0.17
    etto
    0.17
    -looking
    0.17
    Act Density 0.029%

    No Known Activations