INDEX
    Explanations

    expressing feelings, appreciation, or opinions

    New Auto-Interp
    Negative Logits
    0.93
    0.85
    There
    0.80
    ı
    0.78
    ır
    0.75
    N
    0.75
    我们
    0.74
    0.73
     in
    0.73
     (
    0.72
    POSITIVE LOGITS
    s
    1.23
    ς
    0.94
    ่า
    0.86
    ים
    0.80
    0.75
    ی
    0.70
    sion
    0.68
    0.68
    าน
    0.66
    ли
    0.65
    Act Density 0.043%

    No Known Activations