INDEX
    Explanations

    descriptive adjectives followed by nouns

    New Auto-Interp
    Negative Logits
     смысле
    0.32
     quantifying
    0.31
    ່ວນ
    0.31
     রাষ্ট্রীয়
    0.30
    ाइवेट
    0.29
    信仰
    0.29
    寒い
    0.29
     prinsip
    0.29
    fungsi
    0.29
    这个
    0.29
    POSITIVE LOGITS
    ,
    0.31
     plywood
    0.31
     bakery
    0.29
     models
    0.29
    -
    0.27
     walnut
    0.27
     hotel
    0.27
    ened
    0.27
     motorcycle
    0.26
     livestock
    0.26
    Act Density 0.613%

    No Known Activations