INDEX
Explanations
phrases related to actions and behaviors
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.09
3:0.08
4:0.08
5:0.07
6:0.08
7:0.07
8:0.09
9:0.09
10:0.07
11:0.08
Negative Logits
spine
-1.75
NAS
-1.56
ULL
-1.54
chest
-1.52
akeru
-1.51
uesday
-1.48
901
-1.47
zhen
-1.47
ern
-1.47
875
-1.47
POSITIVE LOGITS
".[
1.64
characterize
1.55
}}}
1.53
fielder
1.51
propagation
1.51
civilisation
1.49
nuisance
1.48
breeds
1.48
behavi
1.48
consequential
1.48
Activations Density 0.000%