INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ike
-0.26
}\\
-0.24
preferring
-0.24
stained
-0.23
_builtin
-0.23
aar
-0.23
ILA
-0.23
brush
-0.23
(reinterpret
-0.23
atatype
-0.23
POSITIVE LOGITS
ç»ŀ
0.28
gist
0.26
æĪİ
0.25
flow
0.25
-flow
0.25
zg
0.25
_flow
0.25
epar
0.24
Himself
0.24
pair
0.24
Activations Density 0.005%
No Known Activations
This feature has no known activations.