INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ridiculously
    -0.08
     obscure
    -0.07
     mở
    -0.07
     unlikely
    -0.07
    -0.07
     headers
    -0.07
    begr
    -0.07
     hätte
    -0.07
    -0.07
     danh
    -0.07
    POSITIVE LOGITS
     incidencia
    0.10
     häufiger
    0.10
    Preference
    0.09
     empath
    0.09
     gemiddeld
    0.09
     Preference
    0.09
     vaker
    0.09
     tendency
    0.09
     vaak
    0.09
     propensity
    0.09
    Act Density 0.037%

    No Known Activations