INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
antz
-0.81
alf
-0.77
minist
-0.75
tein
-0.73
sburg
-0.73
audi
-0.69
ende
-0.68
ëĭ
-0.68
arna
-0.67
afa
-0.66
POSITIVE LOGITS
rote
0.85
DEFENSE
0.71
ccording
0.68
rotein
0.67
Prec
0.64
repe
0.64
Probe
0.62
Batt
0.62
looph
0.61
rons
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.