INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    」「
    0.46
     사람
    0.46
    했고
    0.44
     없고
    0.43
    었고
    0.41
     있고
    0.38
    0.37
     가서
    0.37
    股权
    0.35
    0.35
    POSITIVE LOGITS
    ↵↵↵
    0.84
    ↵↵
    0.83
    ↵↵↵↵
    0.82
    ↵↵↵↵↵
    0.59
    ↵↵↵↵↵↵
    0.52
     .,
    0.43
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.42
    ↵↵↵↵↵↵↵
    0.42
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.41
          
    0.40
    Act Density 0.074%

    No Known Activations