INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
cuts
-0.75
bread
-0.69
auctions
-0.65
breat
-0.65
plates
-0.65
hal
-0.63
pine
-0.63
orders
-0.63
sandwiches
-0.63
sand
-0.62
POSITIVE LOGITS
UTC
0.73
elf
0.73
mbuds
0.69
Peng
0.69
Flavoring
0.68
Tiff
0.68
ļé
0.67
yrinth
0.67
KGB
0.66
Beir
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.