INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
gage
-0.69
liga
-0.69
lied
-0.69
score
-0.68
ston
-0.66
Belt
-0.64
bidding
-0.64
fur
-0.63
knife
-0.63
furt
-0.63
POSITIVE LOGITS
aryn
0.74
çīĪ
0.69
oyal
0.67
ushima
0.67
Both
0.65
abol
0.64
amph
0.63
both
0.63
ull
0.63
usting
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.