INDEX
    Explanations

    phrases questioning the reasons or justifications behind actions or statements

    New Auto-Interp
    Negative Logits
     Theſe
    -0.85
     tartalomajánló
    -0.82
     theſe
    -0.82
     myſelf
    -0.81
    Portale
    -0.80
     noDo
    -0.80
    InitVars
    -0.79
    DeleteBehavior
    -0.79
     CWE
    -0.79
     Efq
    -0.79
    POSITIVE LOGITS
    0.56
    also
    0.50
    ,
    0.48
    org
    0.46
     “
    0.45
    ↵↵
    0.45
     also
    0.44
     incluso
    0.44
     -
    0.42
     comp
    0.42
    Act Density 0.111%

    No Known Activations