INDEX
    Explanations

    loyalty to people and groups

    New Auto-Interp
    Negative Logits
    n
    0.59
    on
    0.53
    o
    0.51
    g
    0.50
    h
    0.49
    ah
    0.46
    f
    0.46
    ler
    0.45
    d
    0.44
    í
    0.43
    POSITIVE LOGITS
     loyalty
    1.23
    Loy
    1.08
     loyal
    1.07
     Loyalty
    1.06
     loy
    0.95
    0.92
     allegiance
    0.91
     Loyal
    0.89
     faithfulness
    0.79
    loy
    0.76
    Act Density 0.046%

    No Known Activations