INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.08
3:0.07
4:0.07
5:0.07
6:0.09
7:0.08
8:0.08
9:0.08
10:0.08
11:0.07
Negative Logits
antiquity
-1.51
Gleaming
-1.48
lowly
-1.43
fanbase
-1.37
purity
-1.36
bona
-1.34
parity
-1.33
fortunes
-1.32
receptors
-1.28
rivals
-1.27
POSITIVE LOGITS
ndra
1.64
ertodd
1.48
________________
1.47
iasco
1.41
redacted
1.36
Hasan
1.29
Jail
1.28
Jindal
1.28
phabet
1.26
ahn
1.26
Activations Density 0.000%
No Known Activations
This feature has no known activations.