INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    in
    0.38
    a
    0.37
    re
    0.35
    or
    0.34
    bl
    0.34
    c
    0.34
    Type
    0.33
    ad
    0.33
    A
    0.33
    on
    0.32
    POSITIVE LOGITS
     Każ
    0.48
     prípade
    0.45
    0.44
     avete
    0.43
     tivesse
    0.40
    0.40
    ńskie
    0.38
     ragazzi
    0.38
     Sebelum
    0.38
     mutta
    0.37
    Act Density 0.058%

    No Known Activations