INDEX
    Explanations

    references to academic studies and their findings

    New Auto-Interp
    Negative Logits
    .ov
    -0.15
     elucid
    -0.14
    --+
    -0.14
    arkan
    -0.14
    umpt
    -0.14
     Laugh
    -0.13
     Nom
    -0.13
     ed
    -0.13
    еÑĤи
    -0.13
    ẫn
    -0.13
    POSITIVE LOGITS
     found
    0.45
    found
    0.40
    -found
    0.32
     FOUND
    0.31
    _found
    0.31
     Found
    0.30
     looked
    0.30
    Found
    0.29
    FOUND
    0.27
    	found
    0.26
    Act Density 0.069%

    No Known Activations