INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     houſe
    -0.41
    WriteLiteral
    -0.41
    ępnie
    -0.40
     Photographie
    -0.39
    AddTagHelper
    -0.39
    WireFormatLite
    -0.39
     folgender
    -0.37
     posib
    -0.36
     northwestern
    -0.36
    gac
    -0.36
    POSITIVE LOGITS
    Autoritní
    0.70
     but
    0.68
     autorytatywna
    0.64
     كومونز
    0.59
    Попис
    0.59
    GEBURTSDATUM
    0.54
    Rüyada
    0.53
    SequentialGroup
    0.50
    but
    0.50
    But
    0.49
    Act Density 0.016%

    No Known Activations