INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fraught
    0.42
    ádiz
    0.40
    でしょうか
    0.39
     BadRequest
    0.38
     ды
    0.38
    0.38
    𝒹
    0.38
    0.38
     direct
    0.37
     malign
    0.37
    POSITIVE LOGITS
     v
    1.51
    v
    1.29
     V
    1.08
    V
    1.04
    0.82
    0.78
    0.76
    watch
    0.75
    0.75
    𝒗
    0.71
    Act Density 0.002%

    No Known Activations