INDEX
    Explanations

    references to collective human experiences and common social behaviors

    New Auto-Interp
    Negative Logits
     doesn
    -0.85
     não
    -0.83
    doesn
    -0.77
     didn
    -0.76
     isn
    -0.74
    weren
    -0.73
     never
    -0.72
     neither
    -0.72
     Doesn
    -0.72
     niet
    -0.71
    POSITIVE LOGITS
     đều
    1.07
     except
    0.95
     individually
    0.89
     alike
    0.86
    except
    0.83
     câte
    0.83
     kecuali
    0.82
     equally
    0.81
    ล้ว
    0.81
     sauf
    0.80
    Act Density 0.329%

    No Known Activations