INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
recall
-0.69
bully
-0.66
decide
-0.65
fibre
-0.65
bothered
-0.65
organising
-0.63
¡
-0.63
Able
-0.63
sponsoring
-0.62
Docker
-0.62
POSITIVE LOGITS
pour
0.80
tesque
0.77
OAD
0.72
Pry
0.72
fty
0.71
hib
0.71
itri
0.69
ãģĤ
0.68
igun
0.68
MU
0.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.