INDEX
    Explanations

    key terms related to reasoning and justification in various contexts

    New Auto-Interp
    Negative Logits
    Has
    -0.17
    isko
    -0.15
     Has
    -0.15
    zh
    -0.14
    iso
    -0.14
    Makes
    -0.14
    -has
    -0.14
    ISO
    -0.14
     HAS
    -0.14
    Provides
    -0.13
    POSITIVE LOGITS
     is
    0.56
    çļĦæĺ¯
    0.51
     adalah
    0.40
    å°±æĺ¯
    0.34
     are
    0.33
     æĺ¯
    0.32
    ãģ®ãģ¯
    0.31
    æĺ¯åľ¨
    0.31
     was
    0.30
     lÃł
    0.30
    Act Density 0.491%

    No Known Activations