INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
oversight
-0.82
UGC
-0.75
OGR
-0.69
prelim
-0.68
avez
-0.66
chore
-0.64
Nicaragua
-0.64
Bahá
-0.64
appellate
-0.62
uala
-0.61
POSITIVE LOGITS
arin
0.74
mercial
0.74
stem
0.73
etry
0.69
wagon
0.67
arine
0.66
ritional
0.63
atively
0.62
izen
0.62
teasp
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.