INDEX
    Explanations

    phrases that emphasize comparisons or similarities

    New Auto-Interp
    Negative Logits
    æºĸ
    -0.16
    bil
    -0.15
    atti
    -0.15
    krom
    -0.15
    -UA
    -0.15
    ettes
    -0.15
    uyen
    -0.14
    eks
    -0.14
    zos
    -0.14
    SCAN
    -0.14
    POSITIVE LOGITS
    asad
    0.18
    unto
    0.15
    ligt
    0.15
     those
    0.14
     γεÏģι
    0.14
    lige
    0.14
    nier
    0.14
    ney
    0.14
     unto
    0.14
     lÃŃ
    0.14
    Act Density 0.041%

    No Known Activations