INDEX
Explanations
sentences expressing emotions or personal feelings
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1137
+0.14
0.5%
1271
+0.11
0.4%
410
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1137
+0.14
0.06
1271
+0.11
0.05
410
+0.10
0.04
Negative Logits
cde
-0.53
Sitten
-0.49
Referencer
-0.47
المناصب
-0.47
Kanpo
-0.47
Referências
-0.44
habad
-0.43
Datum
-0.43
Erreferentziak
-0.43
Cat
-0.42
POSITIVE LOGITS
FEEL
0.99
Felt
0.93
feel
0.93
feel
0.90
felt
0.89
Feel
0.87
Felt
0.85
feels
0.85
Feels
0.83
Feel
0.83
Activations Density 0.099%