INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fully
    -0.08
    -0.07
    -0.07
     favourable
    -0.06
     architects
    -0.06
    -0.06
    -care
    -0.06
    -0.06
    Correct
    -0.06
    -0.06
    POSITIVE LOGITS
    artin
    0.08
     برابر
    0.07
     edildi
    0.07
    ίλ
    0.06
     dues
    0.06
     Ihrer
    0.06
    Oi
    0.06
    _serializer
    0.06
     Ler
    0.06
     Firearms
    0.06
    Act Density 0.053%

    No Known Activations