INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     verantwortlich
    0.42
     हमलों
    0.42
     nahezu
    0.41
     kontroll
    0.40
    Zach
    0.40
    0.40
     eigener
    0.39
    laub
    0.39
    percaya
    0.39
     hochwert
    0.38
    POSITIVE LOGITS
     cardinality
    0.49
     companionship
    0.47
    াজন
    0.46
    𝒟
    0.46
    itió
    0.46
     జరిగింది
    0.45
    сан
    0.44
     misfit
    0.44
    了一眼
    0.43
    ênio
    0.43
    Act Density 0.008%

    No Known Activations