INDEX
    Explanations

    greetings and friendly openers

    New Auto-Interp
    Negative Logits
     axiomatic
    0.44
     blame
    0.40
    Thoreau
    0.40
     alcoholism
    0.39
    事實
    0.37
     dementia
    0.37
    nonsense
    0.37
     egregious
    0.36
     Frankly
    0.36
    idlertid
    0.36
    POSITIVE LOGITS
     awesome
    0.71
     Awesome
    0.59
     Hey
    0.57
    Awesome
    0.53
     hey
    0.51
    awesome
    0.51
    めっちゃ
    0.50
     hi
    0.49
     Hi
    0.49
     hola
    0.48
    Act Density 0.001%

    No Known Activations