INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
’
-1.93
habrían
-1.88
podían
-1.80
comenzaron
-1.73
empezaron
-1.73
querían
-1.71
tienden
-1.69
pueden
-1.66
tenían
-1.63
鱻
-1.61
POSITIVE LOGITS
0
1.78
shows
1.66
with
1.65
does
1.50
着一个
1.49
did
1.47
can
1.45
并非
1.41
って思
1.38
performed
1.34
Activations Density 0.000%
No Known Activations
This feature has no known activations.