INDEX
Explanations
references to personal experiences and beliefs, especially related to advocating for a specific cause or disagreeing with others
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.13
0.4%
1150
+0.08
0.2%
823
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.13
0.05
1631
+0.08
0.04
1166
+0.08
0.05
Negative Logits
NTIS
-0.77
actionTypes
-0.65
unlaw
-0.62
animés
-0.61
dovr
-0.59
bahay
-0.58
CiNii
-0.57
kuni
-0.57
naer
-0.57
ftu
-0.57
POSITIVE LOGITS
CreateTagHelper
0.51
preso
0.47
ionais
0.44
chuy
0.42
ricanes
0.42
duong
0.41
<bos>
0.40
іан
0.40
níku
0.39
"/",
0.39
Activations Density 0.399%