INDEX
Explanations
phrases related to beliefs and values, especially focusing on individual beliefs, societal values, and the juxtaposition of different beliefs and values
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.10
0.3%
1265
+0.10
0.3%
1926
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
332
+0.10
0.05
136
+0.10
0.03
1363
+0.09
0.04
Negative Logits
osal
-0.83
hcm
-0.73
rispond
-0.69
dovre
-0.69
encomp
-0.69
interro
-0.69
vogli
-0.67
ridu
-0.67
allarg
-0.66
pessi
-0.65
POSITIVE LOGITS
beliefs
0.81
values
0.62
values
0.60
convictions
0.57
Values
0.56
EINVAL
0.56
thoughts
0.56
attitudes
0.55
principles
0.55
VALUES
0.53
Activations Density 0.327%