INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.08
3:0.09
4:0.08
5:0.08
6:0.08
7:0.07
8:0.09
9:0.08
10:0.07
11:0.08
Negative Logits
aughs
-1.77
ejac
-1.68
Ack
-1.64
KB
-1.56
mun
-1.55
aloud
-1.53
AIR
-1.53
patch
-1.46
Snap
-1.42
cough
-1.40
POSITIVE LOGITS
Reviewer
1.75
dden
1.71
livion
1.69
ignty
1.69
adra
1.67
forestry
1.65
mathemat
1.64
ende
1.64
faire
1.61
ynthesis
1.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.