INDEX
    Explanations

    fairly + adjective/adverb

    New Auto-Interp
    Negative Logits
    ar
    3.64
    л
    3.62
    ع
    3.44
    на
    3.27
    at
    3.25
    u
    3.21
    quele
    3.02
    pped
    2.92
    𝐒
    2.89
     deformation
    2.89
    POSITIVE LOGITS
    ्स
    3.76
    ytale
    3.61
    nce
    2.95
    ی
    2.87
     meisten
    2.78
    ground
    2.72
    nen
    2.55
    ném
    2.54
    нении
    2.54
    зульта
    2.53
    Act Density 0.026%

    No Known Activations