INDEX
Explanations
mentions of political or societal discussions and actions surrounding various issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.8%
1741
+0.21
0.7%
2019
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.23
0.08
50
+0.21
0.06
468
+0.17
0.05
Negative Logits
gardien
-0.83
oeil
-0.83
vété
-0.78
rayures
-0.77
ritratto
-0.75
lyder
-0.74
africains
-0.74
vérit
-0.73
couteau
-0.72
bieber
-0.72
POSITIVE LOGITS
importance
0.71
possibility
0.67
implications
0.60
situation
0.60
role
0.59
minuta
0.59
extent
0.59
amount
0.58
possibilities
0.57
relationship
0.57
Activations Density 0.329%