INDEX
    Explanations

    text related to abstract concepts or arguments

    phrases related to understanding and discussion

    New Auto-Interp
    Negative Logits
    shit
    -0.63
    éĥ
    -0.56
     raping
    -0.53
    tumblr
    -0.53
    Virgin
    -0.53
    Same
    -0.53
    ById
    -0.50
    milo
    -0.49
     inferior
    -0.49
     murderer
    -0.48
    POSITIVE LOGITS
    ascript
    0.56
     cautiously
    0.55
     spoiler
    0.54
     optimistic
    0.53
     summarize
    0.51
     conclud
    0.49
     bookmark
    0.49
     academic
    0.48
     optimism
    0.48
     diplom
    0.48
    Act Density 2.295%

    No Known Activations