INDEX
Explanations
references to counter-related concepts or activities
occurrences of the word "counter" in various contexts
New Auto-Interp
Negative Logits
Forge
-0.79
ABE
-0.78
Known
-0.75
doms
-0.73
Pelicans
-0.72
OOD
-0.72
Roche
-0.72
Tornado
-0.70
Cursed
-0.68
Pistons
-0.67
POSITIVE LOGITS
measures
1.23
intuitive
1.16
attack
1.13
intelligence
1.13
balance
1.10
counter
1.09
offensive
1.03
ror
1.00
fact
1.00
balanced
0.98
Activations Density 0.016%