INDEX
    Explanations

    correctness and common phrases

    New Auto-Interp
    Negative Logits
     Presiden
    0.74
     surfaces
    0.74
    omega
    0.71
    russ
    0.70
     President
    0.68
    owaniu
    0.67
    хь
    0.67
     threat
    0.65
    ishops
    0.65
     Threats
    0.64
    POSITIVE LOGITS
     rectify
    0.94
     정확
    0.87
    download
    0.85
    correct
    0.84
    incorrect
    0.82
    baiki
    0.81
    Correct
    0.81
    donate
    0.80
     inaccurate
    0.80
     Tjiwarl
    0.78
    Act Density 0.002%

    No Known Activations