INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     زی
    -0.07
     розповід
    -0.07
    explain
    -0.07
     CAT
    -0.06
    Sci
    -0.06
    áč
    -0.06
    Less
    -0.06
    ElementException
    -0.06
    Western
    -0.06
     Russell
    -0.06
    POSITIVE LOGITS
     honor
    0.20
     Honor
    0.18
     honour
    0.17
     honors
    0.17
     honored
    0.16
     honoring
    0.15
     Honour
    0.13
     honoured
    0.12
     Hon
    0.12
     honorable
    0.11
    Act Density 0.009%

    No Known Activations