INDEX
    Explanations

    references to a 'boss' or authority figures in various contexts

    New Auto-Interp
    Negative Logits
     مشين
    -0.58
     rapi
    -0.57
     Rules
    -0.55
    Rules
    -0.54
     Hartmann
    -0.51
    keiten
    -0.50
    oureuse
    -0.49
    metr
    -0.49
    ultura
    -0.48
    Leonardo
    -0.48
    POSITIVE LOGITS
    AddTagHelper
    0.83
    0.79
     batch
    0.76
     nahilalakip
    0.73
     duly
    0.73
     sensation
    0.71
    UnsafeEnabled
    0.70
    JsonHelper
    0.69
     mijne
    0.68
    MLLoader
    0.68
    Act Density 0.130%

    No Known Activations