INDEX
    Explanations

    discussions about societal issues and moral dilemmas

    the neuron detects racist or strongly derogatory language aimed at social groups (demeaning/offensive statements).

    New Auto-Interp
    Negative Logits
    ကိုးကား
    -0.44
     bune
    -0.39
    +:+
    -0.36
    -0.34
    ภาค
    -0.34
     heureux
    -0.34
     excelencia
    -0.34
    onViewCreated
    -0.33
    orsche
    -0.32
    eterangan
    -0.31
    POSITIVE LOGITS
     NSCoder
    0.60
    queryInterface
    0.52
     prostitution
    0.50
    ThroughAttribute
    0.49
     illegal
    0.49
     onPostExecute
    0.49
    DebuggerNonUser
    0.49
     harmful
    0.48
     behaviors
    0.47
    Bibliograf
    0.47
    Act Density 0.096%

    No Known Activations