INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     jugement
    0.70
     Vasa
    0.70
     affection
    0.68
    0.66
    0.66
     ngưỡng
    0.66
    0.66
     hoàng
    0.66
    КА
    0.65
     Init
    0.64
    POSITIVE LOGITS
    '
    1.01
    ti
    0.92
    ta
    0.89
    uities
    0.88
    unct
    0.87
    el
    0.86
    ur
    0.83
    tau
    0.82
    v
    0.82
    اء
    0.82
    Act Density 0.022%

    No Known Activations