INDEX
Explanations
phrases related to legal documents or policies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
0.9%
1741
+0.21
0.7%
1967
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.26
0.10
50
+0.21
0.07
1967
+0.17
0.07
Negative Logits
kasarigan
-0.91
pensieri
-0.91
sentimenti
-0.85
parut
-0.83
sogni
-0.78
florales
-0.76
kompati
-0.76
verifyException
-0.76
felicità
-0.74
commenti
-0.71
POSITIVE LOGITS
same
0.66
Varies
0.63
Facile
0.62
Mechanisms
0.61
fanci
0.61
impractica
0.60
Applicability
0.60
dreaded
0.58
aforementioned
0.57
Defective
0.56
Activations Density 0.510%