INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
nir
-0.84
ANK
-0.82
esta
-0.79
heddar
-0.77
flies
-0.73
bush
-0.72
Greek
-0.71
Yamato
-0.69
rain
-0.69
¿
-0.68
POSITIVE LOGITS
eatures
0.86
boycott
0.84
eport
0.71
refrain
0.68
discourse
0.65
recreation
0.65
abstinence
0.64
ruciating
0.63
clinch
0.63
impossibility
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.