INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ва
    1.22
    са
    1.08
    ası
    1.02
    1.00
     quá
    0.99
    ле
    0.99
    ну
    0.99
    в
    0.99
    ра
    0.98
    0.96
    POSITIVE LOGITS
     
    1.28
     battles
    1.11
     battle
    1.08
     Battle
    0.91
     Battles
    0.91
    battle
    0.81
    _
    0.79
    0.78
     war
    0.76
     جنگ
    0.76
    Act Density 0.005%

    No Known Activations