INDEX
Explanations
adjectives to describe qualities or states of being, particularly in phrases using "There is" or "There can be."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.10
0.3%
674
+0.09
0.3%
1705
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1590
+0.10
0.05
331
+0.09
0.05
1809
+0.08
0.04
Negative Logits
wherea
-1.16
fta
-1.06
attemp
-1.04
maneu
-1.03
encomp
-1.03
quitted
-1.01
purcha
-1.00
guarante
-1.00
strick
-0.99
increa
-0.99
POSITIVE LOGITS
worauf
0.68
plenty
0.65
reason
0.56
ways
0.55
no
0.53
nothing
0.53
sodass
0.53
womit
0.52
certain
0.52
cydow
0.52
Activations Density 0.230%