INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     political
    -1.34
     Political
    -1.16
    Political
    -1.00
     POLITICAL
    -0.96
    political
    -0.93
     politically
    -0.93
     politischen
    -0.81
     politische
    -0.75
     politie
    -0.69
     políticos
    -0.68
    POSITIVE LOGITS
    ReusableCell
    0.63
    SFD
    0.54
    manuelle
    0.53
    ITY
    0.53
    ity
    0.52
    chauen
    0.52
     Grüßen
    0.51
    ecake
    0.51
     exclamation
    0.51
    chrän
    0.51
    Act Density 1.564%

    No Known Activations