INDEX
Explanations
instances of the word "attack" and its variations
New Auto-Interp
Negative Logits
{~-0.74
much
-0.61
beliebt
-0.61
whole
-0.61
Ав
-0.61
gu
-0.61
RUNTIME
-0.60
ύ
-0.57
गु
-0.56
թվական
-0.55
POSITIVE LOGITS
Attack
1.75
ATTACK
1.65
attack
1.64
attacks
1.59
Attacks
1.57
ATTACK
1.56
attack
1.55
Attacks
1.53
attacks
1.48
Attack
1.44
Activations Density 0.072%