INDEX
Explanations
descriptive phrases related to visual qualities and experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.27
1.2%
1150
+0.20
0.9%
2034
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1013
+0.27
0.14
284
+0.20
0.10
1150
+0.14
0.07
Negative Logits
<bos>
-2.49
solidar
-0.96
/***
-0.94
},{
-0.84
<?
-0.81
Referències
-0.78
ideolog
-0.77
Мексичка
-0.75
//{
-0.74
ⓧ
-0.74
POSITIVE LOGITS
impra
0.94
disreg
0.93
friable
0.92
tupperware
0.92
unspeak
0.92
CHECKBOX
0.91
maneu
0.90
ecru
0.86
hoody
0.86
Applicability
0.85
Activations Density 1.588%