INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    tik
    -0.09
     polis
    -0.09
    alach
    -0.09
    alu
    -0.08
    amma
    -0.08
     rhyth
    -0.08
    .:.:.
    -0.08
     alcoholic
    -0.08
    oxel
    -0.08
    inding
    -0.08
    POSITIVE LOGITS
     acid
    0.30
     Acid
    0.21
     acids
    0.19
    acid
    0.16
     киÑģлоÑĤ
    0.15
    ally
    0.12
    idal
    0.11
    illin
    0.11
    éħ¸
    0.11
    instanc
    0.11
    Act Density 0.030%

    No Known Activations