INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
efeated
-0.78
nces
-0.77
gets
-0.76
rix
-0.70
cribed
-0.67
ylene
-0.67
Shrine
-0.67
rer
-0.66
ults
-0.65
Chains
-0.64
POSITIVE LOGITS
romeda
0.64
alin
0.61
Idlib
0.61
reconstruction
0.60
conclud
0.60
polarization
0.59
henko
0.59
parting
0.58
shaping
0.56
Punjab
0.56
Activations Density 0.000%
No Known Activations
This feature has no known activations.