INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
discretion
-0.65
cogn
-0.65
URES
-0.61
neut
-0.61
gast
-0.61
nia
-0.60
proportions
-0.59
Adv
-0.58
pse
-0.56
Canaver
-0.56
POSITIVE LOGITS
obyl
0.83
tical
0.82
osher
0.74
phabet
0.71
paio
0.71
onel
0.70
Starts
0.70
ettel
0.68
olithic
0.68
andowski
0.68
Activations Density 0.000%
No Known Activations
This feature has no known activations.