INDEX
Explanations
instances of the word "apparently" in sentences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.11
0.4%
1047
+0.05
0.2%
1506
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1853
+0.11
0.03
198
+0.05
0.03
710
+0.05
0.03
Negative Logits
<bos>
-1.99
/***
-0.86
ⓧ
-0.81
public
-0.78
/*!
-0.78
//}
-0.77
})();
-0.75
-0.74
///**
-0.74
<?
-0.73
POSITIVE LOGITS
affor
2.17
maneu
2.12
increa
2.05
stockholm
2.05
guarante
1.99
volunte
1.98
fta
1.95
accla
1.94
thut
1.90
aen
1.90
Activations Density 0.062%