INDEX
Explanations
potential threats in social situations, particularly related to strangers in specific settings like bars
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
776
+0.10
0.3%
198
+0.09
0.3%
906
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.10
0.06
1579
+0.09
0.04
1825
+0.09
0.03
Negative Logits
ecru
-0.72
hairc
-0.71
swarovski
-0.71
cushi
-0.69
ineffec
-0.66
lamborghini
-0.66
silken
-0.64
Whence
-0.64
tupperware
-0.63
disagre
-0.63
POSITIVE LOGITS
suspiciously
0.64
suddenly
0.61
suspicious
0.60
signs
0.56
odd
0.55
strange
0.55
unusually
0.53
sudden
0.52
LookAnd
0.52
strangely
0.50
Activations Density 0.760%