INDEX
Explanations
phrases indicating comparison of different entities in terms of qualities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1622
+0.10
0.3%
674
+0.10
0.3%
438
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1622
+0.10
0.03
1262
+0.10
0.03
1643
+0.08
0.04
Negative Logits
Kün
-0.77
Simult
-0.65
Välislingid
-0.59
Ueb
-0.58
lele
-0.58
meras
-0.57
bambu
-0.56
tortas
-0.55
Jä
-0.55
Avez
-0.55
POSITIVE LOGITS
WALTZ
0.59
not
0.56
nemici
0.56
ladri
0.55
setzer
0.55
giapp
0.54
nemico
0.52
not
0.52
disrespect
0.51
meant
0.50
Activations Density 0.146%