INDEX
    Explanations

    phrases indicating scientific agreement or conclusions in research papers

    New Auto-Interp
    Negative Logits
    principalColumn
    -0.72
    Zunanje
    -0.70
    ecutive
    -0.67
    jectures
    -0.66
     MainAxisSize
    -0.65
     كومونز
    -0.62
    Tembelea
    -0.60
    .~(\
    -0.60
    BoxShadow
    -0.60
     Paglinawan
    -0.60
    POSITIVE LOGITS
    ↵↵
    0.92
    <eos>
    0.74
    ↵↵↵
    0.71
    ↵↵↵↵
    0.68
    The
    0.60
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.60
    ↵↵↵↵↵↵
    0.58
    ↵↵↵↵↵
    0.57
    0.54
    ↵↵↵↵↵↵↵↵↵
    0.53
    Act Density 0.578%

    No Known Activations