INDEX
    Explanations

    perceived moral situation

    New Auto-Interp
    Negative Logits
     direc
    0.44
     warmup
    0.42
     directorio
    0.41
     contribut
    0.40
    lite
    0.39
    权重
    0.39
    OfWeek
    0.39
     interests
    0.38
     discounts
    0.38
    服装
    0.38
    POSITIVE LOGITS
     masterpiece
    0.46
     manusia
    0.45
     tragedy
    0.43
    這位
    0.40
     Elon
    0.39
    这位
    0.39
     Wahrheit
    0.39
    那位
    0.38
    0.38
     انکار
    0.38
    Act Density 0.000%

    No Known Activations