INDEX
    Explanations

    topics related to morality and ethical debates

    New Auto-Interp
    Negative Logits
    _mC
    -0.16
    _tC
    -0.15
    _tF
    -0.15
    rud
    -0.15
    posables
    -0.14
    _mD
    -0.14
    _mB
    -0.14
    imu
    -0.14
    _tE
    -0.13
    SplitOptions
    -0.13
    POSITIVE LOGITS
    æĻĵ
    0.14
    åħ·æľī
    0.13
    èģ
    0.13
    éĤ£ç§į
    0.13
    favor
    0.13
    ãģ«ãģĬãģĦãģ¦
    0.12
    èĥ½å¤Ł
    0.12
    386
    0.12
    517
    0.12
     regard
    0.12
    Act Density 0.050%

    No Known Activations