INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
HB
-0.76
enhagen
-0.71
ropolis
-0.71
Tea
-0.70
amaz
-0.69
adata
-0.69
utical
-0.67
pedia
-0.66
isance
-0.66
alian
-0.65
POSITIVE LOGITS
veter
0.74
ELF
0.74
xual
0.72
rounded
0.70
fired
0.70
athlet
0.65
istar
0.65
unemploy
0.65
firing
0.65
overcl
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.