INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Match
    -0.82
     Match
    -0.75
    match
    -0.68
     MATCH
    -0.55
    MATCH
    -0.49
    匹配
    -0.48
     matching
    -0.47
    anz
    -0.43
     match
    -0.42
    Matcher
    -0.42
    POSITIVE LOGITS
     Theſe
    1.01
     Efq
    0.93
     Monfieur
    0.92
     myſelf
    0.92
     poffible
    0.88
     poffe
    0.88
    ſelf
    0.85
     uſed
    0.84
     pleaſure
    0.81
     themſelves
    0.79
    Act Density 0.155%

    No Known Activations