INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.77
    0.70
    0.70
    0.70
    0.69
    臨床
    0.68
    0.68
    0.67
    0.67
    0.66
    POSITIVE LOGITS
     =
    0.58
     
    0.56
    ng
    0.49
     good
    0.49
    ,
    0.48
     C
    0.48
     heaven
    0.47
     D
    0.46
     boundless
    0.46
     chop
    0.45
    Act Density 0.017%

    No Known Activations