INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
envy
-0.67
arez
-0.64
ropy
-0.64
elvet
-0.64
arrogance
-0.63
compliments
-0.62
obesity
-0.62
frontrunner
-0.61
pport
-0.61
Deputy
-0.61
POSITIVE LOGITS
ten
0.85
adj
0.81
sold
0.79
techn
0.78
location
0.74
technical
0.73
format
0.73
portion
0.72
bos
0.69
angan
0.69
Activations Density 0.000%
No Known Activations
This feature has no known activations.