INDEX
Explanations
phrases related to motivation and psychological factors influencing behavior
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1306
+0.15
0.5%
2004
+0.13
0.4%
1950
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1306
+0.15
0.03
1950
+0.13
0.03
1516
+0.12
0.02
Negative Logits
palab
-0.61
robus
-0.59
amal
-0.59
arbitrar
-0.58
calum
-0.56
civiliz
-0.56
lana
-0.56
Verk
-0.56
haer
-0.55
incompet
-0.55
POSITIVE LOGITS
motivation
1.26
motivated
1.15
motivation
1.10
Motivation
1.10
motivated
1.10
Motivation
1.10
motivations
1.06
motiv
1.05
Motiv
1.03
motivate
1.03
Activations Density 0.062%