INDEX
    Explanations

    sentences related to ethics and responsibility

    punctuation marks, specifically periods, indicating the end of statements

    New Auto-Interp
    Negative Logits
    ikuman
    -0.69
     mosqu
    -0.66
     undermin
    -0.64
     glim
    -0.63
    ensibly
    -0.62
    omorphic
    -0.59
    iste
    -0.59
     stranger
    -0.59
    ogly
    -0.58
     initialization
    -0.57
    POSITIVE LOGITS
    ↵↵
    0.99
     They
    0.95
     Secondly
    0.93
     Additionally
    0.92
     However
    0.91
    0.91
     Also
    0.90
    ↵Âł
    0.89
     Therefore
    0.87
     Alternatively
    0.85
    Act Density 0.677%

    No Known Activations