INDEX
Explanations
phrases descriptive of features and evaluations in reviews, especially related to video games
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1839
+0.10
0.3%
1110
+0.08
0.2%
152
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.10
0.08
1839
+0.08
0.06
152
+0.08
0.05
Negative Logits
ivi
-1.59
emphat
-1.54
accla
-1.49
volunte
-1.43
increa
-1.41
embra
-1.41
apprehen
-1.41
philanth
-1.40
reluct
-1.39
Confe
-1.39
POSITIVE LOGITS
regarding
0.62
spli
0.61
when
0.60
towards
0.59
toward
0.59
CONFLICT
0.58
in
0.58
for
0.58
among
0.57
between
0.57
Activations Density 0.407%