INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
BER
-0.75
Ply
-0.72
ighed
-0.71
Nas
-0.70
bane
-0.66
Constructed
-0.64
Bron
-0.63
hirt
-0.62
ellect
-0.62
quote
-0.61
POSITIVE LOGITS
artifacts
0.77
axter
0.76
samurai
0.74
URA
0.70
torches
0.70
flashback
0.69
olla
0.68
Dino
0.68
squid
0.66
Armageddon
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.