INDEX
Explanations
references to competitors or enemies in various contexts
references to competition or opposing entities
New Auto-Interp
Negative Logits
udeb
-0.78
ADD
-0.70
olia
-0.69
uggage
-0.67
RGB
-0.67
ALT
-0.67
acca
-0.66
ortunate
-0.65
umatic
-0.65
Hat
-0.64
POSITIVE LOGITS
rival
1.19
rivals
1.14
ries
1.04
challengers
0.93
competitor
0.92
competitors
0.90
adversaries
0.80
challenger
0.78
rivalry
0.78
untled
0.78
Activations Density 0.014%