INDEX
    Explanations

    references to 'truth' and related concepts of honesty and transparency

    New Auto-Interp
    Negative Logits
     Hij
    -0.15
    ICLE
    -0.15
    iac
    -0.14
    equal
    -0.14
    haus
    -0.14
    hire
    -0.14
    zet
    -0.14
    à¹Ħว
    -0.14
    herited
    -0.13
    hof
    -0.13
    POSITIVE LOGITS
    fulness
    0.35
    fully
    0.34
    iness
    0.28
    FUL
    0.22
     serum
    0.22
    FULL
    0.21
     Serum
    0.21
    full
    0.20
    worthy
    0.19
    ilde
    0.18
    Act Density 0.015%

    No Known Activations