INDEX
Explanations
instances of the word "it" followed by a certain context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1438
+0.12
0.4%
674
+0.11
0.4%
871
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
25
+0.12
0.08
1548
+0.11
0.06
82
+0.11
0.07
Negative Logits
McLaugh
-1.03
Châ
-0.99
McInt
-0.99
Bartholo
-0.95
Juf
-0.95
Perci
-0.94
Daven
-0.92
Gorb
-0.92
Rine
-0.89
Rodrig
-0.85
POSITIVE LOGITS
happened
0.74
happens
0.72
happen
0.67
wasn
0.65
seems
0.63
occurred
0.58
rained
0.58
alians
0.57
seemed
0.57
beho
0.57
Activations Density 0.393%