INDEX
Explanations
instances where the word "this" is followed by other words with varied activations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
468
+0.14
0.4%
2034
+0.11
0.3%
314
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
468
+0.14
0.04
1648
+0.11
0.02
1718
+0.09
0.03
Negative Logits
venice
-1.36
eiffel
-1.33
jurassic
-1.30
autunno
-1.30
riviera
-1.29
toledo
-1.26
stockholm
-1.24
parteci
-1.22
outlander
-1.21
sappi
-1.20
POSITIVE LOGITS
case
0.85
particular
0.79
instance
0.79
context
0.78
regard
0.77
case
0.71
scenario
0.71
situation
0.70
example
0.67
manner
0.67
Activations Density 0.096%