INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Lego
    -0.07
    —↵↵
    -0.07
     Ras
    -0.06
    asso
    -0.06
    entai
    -0.06
     bent
    -0.06
    -0.06
     tours
    -0.06
     Ciudad
    -0.06
    aston
    -0.06
    POSITIVE LOGITS
     opting
    0.09
     option
    0.07
    主義
    0.06
    τερα
    0.06
     thí
    0.06
     opted
    0.06
    ZERO
    0.06
    imal
    0.06
     Islamist
    0.05
    however
    0.05
    Act Density 0.017%

    No Known Activations