INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
pity
-0.87
mercy
-0.75
utonium
-0.73
parity
-0.71
uno
-0.71
hex
-0.70
ideo
-0.70
age
-0.69
urses
-0.67
psons
-0.66
POSITIVE LOGITS
Interstitial
0.84
SPONSORED
0.80
RELATED
0.79
FUN
0.76
VERTISEMENT
0.76
ADVERTISEMENT
0.75
PHOTOS
0.75
Questions
0.74
Tokens
0.74
Introduced
0.72
Activations Density 0.000%
No Known Activations
This feature has no known activations.