INDEX
Explanations
phrases related to comparison or evaluation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
75
+0.13
0.4%
197
+0.12
0.4%
703
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
75
+0.13
0.03
197
+0.12
0.03
703
+0.12
0.03
Negative Logits
newList
-0.71
newArr
-0.64
confided
-0.60
luxuriant
-0.59
indoc
-0.56
Rodrig
-0.56
Bartholo
-0.56
intersper
-0.55
forbade
-0.55
newData
-0.55
POSITIVE LOGITS
terms
0.87
terms
0.84
Terms
0.82
TERMS
0.79
TERMS
0.78
Terms
0.73
termes
0.60
himo
0.58
homonymie
0.57
regards
0.55
Activations Density 0.048%