INDEX
Explanations
phrases indicating commitment or assurance in relationships or agreements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.21
0.8%
537
+0.12
0.5%
1527
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
537
+0.21
0.04
1413
+0.12
0.03
596
+0.11
0.03
Negative Logits
<bos>
-2.10
betweenstory
-0.75
styleType
-0.70
DisplayMetrics
-0.68
stackrel
-0.65
react
-0.65
WriteLiteral
-0.64
isDirectory
-0.64
uxxxx
-0.64
adaptiveStyles
-0.63
POSITIVE LOGITS
maneu
1.55
shenan
1.45
hentai
1.45
impra
1.43
depic
1.42
disreg
1.40
stockholm
1.40
embodi
1.39
resear
1.38
snoopy
1.36
Activations Density 0.207%