INDEX
    Explanations

    references to freedom and patriotism

    New Auto-Interp
    Negative Logits
    ãĥ¼ãĥĨ
    -0.75
    ħĭ
    -0.73
    Magikarp
    -0.69
     organizational
    -0.69
    ãĥ¼ãĥĨãĤ£
    -0.64
     informational
    -0.64
    ij士
    -0.63
    ãĤ¼ãĤ¦ãĤ¹
    -0.62
     workplaces
    -0.62
     undermin
    -0.60
    POSITIVE LOGITS
    1.11
    ..
    0.91
     ..
    0.84
    ..."
    0.83
    ...
    0.80
    -"
    0.78
    ↵↵
    0.76
     //
    0.75
     */
    0.74
    ...)
    0.74
    Act Density 0.097%

    No Known Activations