INDEX
    Explanations

    selection and evaluation

    New Auto-Interp
    Negative Logits
    с
    0.49
    ς
    0.38
    terra
    0.37
    beer
    0.37
     it
    0.37
    company
    0.36
    setColor
    0.35
    0.34
    ئا
    0.34
    但是
    0.34
    POSITIVE LOGITS
    in
    0.70
    on
    0.57
    ر
    0.54
    ين
    0.52
    ار
    0.50
    ר
    0.50
     on
    0.50
    r
    0.49
    ar
    0.49
     hvis
    0.49
    Act Density 1.215%

    No Known Activations