INDEX
    Explanations

    pairs followed by respectively

    New Auto-Interp
    Negative Logits
     Küsten
    -0.82
     whet
    -0.82
    Might
    -0.82
    ish
    -0.81
    -0.79
    可能会
    -0.79
    žní
    -0.78
     vissa
    -0.77
    สอง
    -0.76
    ngdoc
    -0.76
    POSITIVE LOGITS
    bows
    0.98
    لاب
    0.90
     territ
    0.89
    DAYS
    0.89
    kpop
    0.87
    gernaut
    0.85
    🏅
    0.83
    babies
    0.82
     bestemt
    0.82
    続けて
    0.82
    Act Density 0.019%

    No Known Activations