INDEX
    Explanations

    "very" followed by descriptive adjectives

    New Auto-Interp
    Negative Logits
    ق
    1.78
    ب
    1.75
    ו
    1.73
    o
    1.70
    กว่า
    1.61
    بوت
    1.59
    د
    1.57
    ರ್ಧ
    1.52
    ות
    1.51
    ن
    1.43
    POSITIVE LOGITS
    1.63
    }$
    1.55
    1.47
    1.44
    OOL
    1.41
    ı
    1.41
    day
    1.40
    1.37
    也非常
    1.36
    ru
    1.34
    Act Density 0.108%

    No Known Activations