INDEX
Explanations
adjectives or verbs related to opposition or disagreement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
605
+0.11
0.3%
321
+0.10
0.3%
617
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
321
+0.11
0.05
1759
+0.10
0.04
584
+0.08
0.03
Negative Logits
<bos>
-0.58
ATEGY
-0.45
actualité
-0.43
argout
-0.42
rangs
-0.42
verläs
-0.40
earcher
-0.39
Література
-0.39
dade
-0.39
skriv
-0.39
POSITIVE LOGITS
stratigraph
0.73
frankfurt
0.63
depic
0.61
embodi
0.60
encomp
0.59
stockholm
0.58
opposes
0.57
munich
0.57
Silurian
0.56
resemb
0.56
Activations Density 0.185%