INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    contentLoaded
    -0.70
     kaarangay
    -0.63
     BoxFit
    -0.62
     Biôgrafia
    -0.61
    protoimpl
    -0.59
     PopupWindow
    -0.59
     oprot
    -0.58
     Roskov
    -0.57
     Vikipedi
    -0.55
    ografija
    -0.55
    POSITIVE LOGITS
     good
    1.13
    good
    0.88
    Good
    0.73
     Good
    0.70
     GOOD
    0.68
     buena
    0.67
     goede
    0.65
    GOOD
    0.63
     gute
    0.60
     buenas
    0.60
    Act Density 0.001%

    No Known Activations