INDEX
    Explanations

    phrases and concepts related to truth and transparency

    New Auto-Interp
    Negative Logits
    ully
    -0.18
    sg
    -0.17
    alu
    -0.16
    sm
    -0.16
    otel
    -0.16
    AYS
    -0.15
    å¿Ĺ
    -0.15
     BÃł
    -0.15
    ati
    -0.15
    381
    -0.14
    POSITIVE LOGITS
    truth
    0.22
    Truth
    0.21
     truth
    0.21
     Truth
    0.20
    ãĥ³ãĤº
    0.18
    Expose
    0.18
    ÏģοÏį
    0.17
     truths
    0.17
     verdad
    0.17
     freeing
    0.16
    Act Density 0.078%

    No Known Activations