INDEX
    Explanations

    intensifier for strong adjectives

    New Auto-Interp
    Negative Logits
    Absolute
    0.86
     Absolute
    0.80
    2
    0.77
    absolute
    0.75
     absolute
    0.71
     абсолю
    0.68
    0.64
    DI
    0.64
    :
    0.64
    ۲
    0.59
    POSITIVE LOGITS
    ين
    0.90
    ام
    0.89
    ной
    0.86
    ли
    0.85
    ن
    0.80
    v
    0.79
    ва
    0.77
    vq
    0.73
    ів
    0.73
    ку
    0.73
    Act Density 0.003%

    No Known Activations