INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.07
3:0.07
4:0.10
5:0.08
6:0.07
7:0.08
8:0.07
9:0.08
10:0.10
11:0.08
Negative Logits
offending
-1.65
lessly
-1.57
********************************
-1.56
unavoidable
-1.52
indisc
-1.52
safely
-1.51
eware
-1.51
ardless
-1.51
containing
-1.46
uously
-1.45
POSITIVE LOGITS
Flight
2.08
shapeshifter
1.76
GPU
1.72
�
1.68
rists
1.67
Robotics
1.66
Interest
1.66
rius
1.62
Dynamics
1.61
mosp
1.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.