INDEX
Explanations
phrases related to product reviews
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
964
+0.12
0.4%
1150
+0.11
0.3%
674
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.12
0.06
651
+0.11
0.03
227
+0.10
0.06
Negative Logits
ьаж
-1.09
Obrigado
-0.89
Obrigada
-0.87
цездатний
-0.86
تانيه
-0.85
насељу
-0.85
.*")]
-0.81
IsContent
-0.81
Зноскі
-0.79
!*\
-0.79
POSITIVE LOGITS
reluct
2.75
encomp
2.59
shenan
2.54
increa
2.48
impra
2.46
affor
2.44
disagre
2.43
depic
2.41
guarante
2.39
resear
2.37
Activations Density 0.396%