INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     or
    0.89
     apparently
    0.83
    y
    0.82
     then
    0.80
     and
    0.79
     (
    0.78
     seemingly
    0.77
     혹은
    0.74
     there
    0.74
     plus
    0.73
    POSITIVE LOGITS
    .,
    2.49
    .).
    2.30
    .):
    2.05
    .:
    2.03
    .);
    1.96
    .),
    1.94
    .)..
    1.82
    .,"
    1.81
    .)
    1.79
    .;
    1.76
    Act Density 0.341%

    No Known Activations