INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     for
    0.78
    at
    0.71
    ר
    0.70
    an
    0.69
    O
    0.65
    0
    0.59
    ار
    0.57
    Ak
    0.57
    0.57
    as
    0.57
    POSITIVE LOGITS
    vana
    0.55
     खासतौर
    0.54
     باسک
    0.50
    denly
    0.49
    elry
    0.49
     shallower
    0.47
    ológico
    0.46
     wealthier
    0.46
     westerly
    0.46
    ="./
    0.46
    Act Density 0.010%

    No Known Activations