INDEX
    Explanations

    relationships involving loyalty and betrayal

    New Auto-Interp
    Negative Logits
    leftright
    -0.20
    vara
    -0.16
    ingles
    -0.16
    IFn
    -0.15
    liers
    -0.15
    ael
    -0.15
    nga
    -0.15
    ÙĪØ±Ø¯
    -0.15
    ovsky
    -0.14
    655
    -0.14
    POSITIVE LOGITS
    еÑı
    0.15
    mt
    0.14
    imir
    0.14
    hay
    0.14
    embr
    0.14
     Pav
    0.14
     Patriot
    0.14
    ance
    0.13
    ius
    0.13
     Pathfinder
    0.13
    Act Density 0.045%

    No Known Activations