INDEX
Explanations
adjectives related to physical characteristics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.16
0.5%
438
+0.12
0.4%
1253
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
438
+0.16
0.05
1327
+0.12
0.03
208
+0.10
0.04
Negative Logits
vété
-0.79
unspeak
-0.76
fameux
-0.72
congrès
-0.70
miroir
-0.70
tournant
-0.68
appui
-0.68
nuage
-0.67
complément
-0.67
levier
-0.66
POSITIVE LOGITS
vogli
1.00
ideolog
0.99
utop
0.95
succede
0.93
solidar
0.89
<bos>
0.87
gymnas
0.79
voleva
0.78
patin
0.75
atle
0.75
Activations Density 0.269%