INDEX
    Explanations

    significant differences

    New Auto-Interp
    Negative Logits
     significant
    -1.44
    Significant
    -1.20
    significant
    -1.13
     Significant
    -1.04
     önemli
    -1.02
     SIGNIFIC
    -0.98
     суще
    -0.89
     значи
    -0.89
     конкурс
    -0.88
    kuri
    -0.87
    POSITIVE LOGITS
     differences
    1.84
     difference
    1.76
     Difference
    1.41
    Difference
    1.38
    difference
    1.30
     Differences
    1.25
    Differences
    1.20
    differences
    1.11
     diferencias
    1.11
    差异
    1.07
    Act Density 0.043%

    No Known Activations