INDEX
    Explanations

    phrases related to commands, warnings, and instructions

    phrases that express certainty or existence

    New Auto-Interp
    Negative Logits
    icio
    -0.56
    artney
    -0.55
    arlane
    -0.54
    ibaba
    -0.52
    76561
    -0.50
    yip
    -0.50
    20439
    -0.49
    ento
    -0.49
    yrinth
    -0.49
    uggest
    -0.49
    POSITIVE LOGITS
    !
    1.36
    !:
    1.33
    !.
    1.29
     ;)
    1.23
    !!!
    1.20
    .:
    1.19
     ðŁĻĤ
    1.18
     :)
    1.17
    !!!!
    1.17
    !,
    1.17
    Act Density 0.737%

    No Known Activations