INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ubi
-0.88
sever
-0.74
pots
-0.70
notes
-0.68
aways
-0.68
wine
-0.68
ivably
-0.68
usb
-0.67
humans
-0.67
ichick
-0.67
POSITIVE LOGITS
same
1.02
opposite
0.85
highest
0.82
lowest
0.81
latter
0.76
overall
0.75
extent
0.73
onset
0.72
average
0.72
remainder
0.71
Activations Density 0.000%
No Known Activations
This feature has no known activations.