INDEX
Explanations
phrases related to control and regulation
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.05
3:0.07
4:0.16
5:0.03
6:0.03
7:0.33
8:0.03
9:0.03
10:0.10
11:0.08
Negative Logits
���
-1.80
arnaev
-1.58
evidence
-1.47
bery
-1.46
lished
-1.45
nces
-1.43
pared
-1.42
payer
-1.41
pite
-1.40
ilipp
-1.39
POSITIVE LOGITS
functions
1.64
hunger
1.54
vibration
1.53
Alexa
1.45
airflow
1.45
brightness
1.43
axis
1.42
potency
1.37
behaviors
1.36
playback
1.34
Activations Density 0.023%