INDEX
Explanations
instances of the word "like."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
1.0%
331
+0.12
0.5%
1334
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
554
+0.23
0.04
1101
+0.12
0.04
331
+0.10
0.03
Negative Logits
<bos>
-1.86
/***
-0.58
Enlaces
-0.57
///**
-0.56
ref
-0.56
ыгана
-0.54
乜
-0.54
Παραπομπές
-0.54
Eksterne
-0.53
</table>
-0.52
POSITIVE LOGITS
impractica
1.41
impra
1.15
disagre
1.10
perfon
1.08
liberality
1.07
unwarran
1.03
sovere
1.03
viciss
1.02
reluct
1.01
ftu
1.01
Activations Density 0.084%