INDEX
    Explanations

    warrior/krieger

    New Auto-Interp
    Negative Logits
     Warrior
    -1.27
    Warrior
    -1.16
    warrior
    -1.11
     warrior
    -0.96
     Warriors
    -0.95
    Warriors
    -0.86
     warlike
    -0.75
     Krieger
    -0.75
    isters
    -0.74
     warriors
    -0.73
    POSITIVE LOGITS
     Gla
    0.47
     National
    0.47
    ريم
    0.46
     smart
    0.45
     Who
    0.44
     people
    0.44
     Sol
    0.44
     sm
    0.44
     um
    0.44
     po
    0.43
    Act Density 0.222%

    No Known Activations