INDEX
Explanations
adverbs and expressions denoting uncertainty or opinion such as 'kind of' and 'I think'
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1705
+0.10
0.3%
1618
+0.10
0.3%
1757
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1705
+0.10
0.04
726
+0.10
0.03
1618
+0.09
0.02
Negative Logits
karet
-0.50
Val
-0.47
rokok
-0.46
FDRE
-0.45
}}]{-0.45
Value
-0.44
AIT
-0.44
PointSize
-0.43
Ast
-0.43
Met
-0.42
POSITIVE LOGITS
impractica
0.83
liberality
0.75
sorta
0.73
unce
0.71
unlaw
0.70
mortgagee
0.65
viciss
0.64
ingrat
0.64
shewn
0.64
Genau
0.63
Activations Density 0.116%