INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
plur
-0.78
Accessory
-0.74
observable
-0.71
Consumer
-0.70
abulary
-0.69
shenan
-0.68
ables
-0.65
freezer
-0.65
pant
-0.65
neut
-0.64
POSITIVE LOGITS
olulu
0.92
recruit
0.75
ieri
0.66
awa
0.64
uchi
0.63
hra
0.62
nikov
0.62
Ghana
0.62
thia
0.61
uba
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.