INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.08
3:0.09
4:0.08
5:0.08
6:0.07
7:0.08
8:0.08
9:0.08
10:0.09
11:0.07
Negative Logits
Dek
-3.00
+++
-2.74
tasted
-2.61
IRED
-2.53
SUN
-2.50
expelled
-2.49
Sack
-2.42
ilda
-2.41
SCHOOL
-2.41
DID
-2.40
POSITIVE LOGITS
messenger
2.90
weather
2.87
promoter
2.61
orum
2.54
thor
2.54
fin
2.52
etr
2.48
spoiler
2.44
radiator
2.43
pend
2.40
Activations Density 0.000%
No Known Activations
This feature has no known activations.