INDEX
    Explanations

    proper nouns or specific terms

    New Auto-Interp
    Negative Logits
     (
    0.71
     여러
    0.61
     afresh
    0.57
     אחד
    0.56
     scrutin
    0.55
     וש
    0.53
    0.52
    这个
    0.52
     بڑی
    0.52
     vasos
    0.52
    POSITIVE LOGITS
    0.86
     in
    0.82
    0.82
    та
    0.81
    0.73
    .
    0.72
    0.70
    ın
    0.70
    ের
    0.67
    т
    0.66
    Act Density 0.120%

    No Known Activations