INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.08
3:0.08
4:0.08
5:0.09
6:0.09
7:0.08
8:0.08
9:0.08
10:0.08
11:0.06
Negative Logits
invalid
-1.64
discrepancies
-1.63
suspicions
-1.57
disagreement
-1.55
argument
-1.55
Shares
-1.52
JP
-1.51
retweet
-1.50
hears
-1.49
Heidi
-1.48
POSITIVE LOGITS
tremend
2.09
aceutical
1.98
apons
1.84
aeda
1.83
�
1.72
foremost
1.70
thood
1.69
sparing
1.65
astered
1.65
civilisation
1.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.