INDEX
    Explanations

    the start of new sections or significant changes in content within the text

    New Auto-Interp
    Negative Logits
    ThroughAttribute
    -0.81
    NUMX
    -0.75
    -0.67
    клопе
    -0.64
    Rhestr
    -0.64
     وتسجيلات
    -0.61
    bkz
    -0.60
    enderror
    -0.60
    例文帳に追加
    -0.60
     NSCoder
    -0.60
    POSITIVE LOGITS
     we
    0.52
     coach
    0.51
    ,
    0.49
    <bos>
    0.49
     Rock
    0.48
     but
    0.46
     prez
    0.46
    0.46
     dad
    0.45
    Prefix
    0.44
    Act Density 0.092%

    No Known Activations