INDEX
    Explanations

    words related to emotions and reactions, especially negative emotions like disgust, anger, and upset

    New Auto-Interp
    Negative Logits
     meras
    -0.88
     utop
    -0.85
     makro
    -0.83
     kram
    -0.77
     elek
    -0.77
     hunde
    -0.71
     paus
    -0.71
     ortop
    -0.70
     sement
    -0.70
     palet
    -0.70
    POSITIVE LOGITS
     Plotting
    0.64
     about
    0.62
     by
    0.53
     INPUTS
    0.53
     Iterate
    0.51
    jątk
    0.51
     Initialise
    0.49
    ness
    0.49
     because
    0.48
     Parsing
    0.47
    Act Density 0.228%

    No Known Activations