INDEX
    Explanations

    verbs related to providing information or explanations

    instances where previous points or mentions are referenced

    New Auto-Interp
    Negative Logits
    wcs
    -0.76
    ãĤ¼ãĤ¦ãĤ¹
    -0.73
    OPE
    -0.70
    orest
    -0.70
    enez
    -0.69
     replica
    -0.68
    erate
    -0.68
    ctors
    -0.67
    ealous
    -0.65
    orah
    -0.65
    POSITIVE LOGITS
     [|
    0.75
     Tale
    0.74
     Hacker
    0.71
     TOD
    0.71
     newsp
    0.70
     Hier
    0.68
     commenter
    0.67
     Mish
    0.67
     spoiler
    0.65
     âĺ
    0.65
    Act Density 0.182%

    No Known Activations