INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.05
2:0.09
3:0.08
4:0.08
5:0.08
6:0.07
7:0.08
8:0.08
9:0.08
10:0.09
11:0.08
Negative Logits
itored
-1.91
untled
-1.81
utonium
-1.69
cohol
-1.66
ACTED
-1.66
Sending
-1.57
Probe
-1.51
omore
-1.50
PsyNetMessage
-1.48
selves
-1.47
POSITIVE LOGITS
simplicity
1.70
RAG
1.68
Monkey
1.67
wonders
1.59
Browser
1.48
hon
1.48
wic
1.47
motto
1.45
FANT
1.44
Sim
1.44
Activations Density 0.000%
No Known Activations
This feature has no known activations.