INDEX
    Explanations

    instances of analogies and metaphors used for explanations

    New Auto-Interp
    Negative Logits
    iw
    -0.15
    ело
    -0.14
    ÙĤاء
    -0.14
    å¼ĥ
    -0.14
    olang
    -0.14
    angered
    -0.14
    ONGO
    -0.13
    wrap
    -0.13
    à¥įसर
    -0.13
     عز
    -0.13
    POSITIVE LOGITS
     example
    0.16
    ubi
    0.15
    uki
    0.15
    ahi
    0.15
     Ridley
    0.14
    ä¾ĭ
    0.14
     analogy
    0.14
     bidi
    0.14
    permalink
    0.14
    929
    0.14
    Act Density 0.179%

    No Known Activations