INDEX
Explanations
words related to physical combat or disputes
references to physical confrontations or disputes
New Auto-Interp
Negative Logits
Arch
-0.67
Mellon
-0.65
query
-0.63
indicating
-0.63
SS
-0.63
infer
-0.62
constituted
-0.62
sample
-0.61
tuber
-0.61
indicative
-0.60
POSITIVE LOGITS
fights
3.81
fights
2.94
battles
2.27
fight
2.26
fight
2.17
Fight
1.79
Fight
1.68
fighting
1.67
fighting
1.63
FIGHT
1.63
Activations Density 0.010%