INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    \
    0.89
    si
    0.80
    in
    0.77
    žič
    0.76
    don
    0.75
    sen
    0.73
    h
    0.73
    0.73
    dan
    0.70
    financ
    0.70
    POSITIVE LOGITS
     match
    1.00
     matches
    1.00
    マッチ
    0.98
    0.96
    0.93
    に合わせて
    0.88
     matchups
    0.85
     mismatch
    0.84
    匹配
    0.83
     Match
    0.80
    Act Density 0.177%

    No Known Activations