INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ldc
    -0.55
    blom
    -0.49
     uſe
    -0.47
    しろ
    -0.45
     BorderSide
    -0.45
     Doğ
    -0.45
    toContain
    -0.45
     Bergh
    -0.43
    thmus
    -0.42
     Hul
    -0.42
    POSITIVE LOGITS
    Q
    1.24
     Q
    1.22
    q
    1.16
     Queen
    1.15
     q
    1.10
    Queen
    1.09
     queen
    1.05
     QUEEN
    1.03
    QB
    1.03
    queen
    1.02
    Act Density 0.172%

    No Known Activations