INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
daq
-0.72
pour
-0.71
fortune
-0.69
antha
-0.67
Sawyer
-0.66
Rockefeller
-0.65
stress
-0.64
sell
-0.64
Chomsky
-0.64
querque
-0.63
POSITIVE LOGITS
GU
0.69
Ñģ
0.68
uru
0.65
imet
0.64
romy
0.62
л
0.61
20439
0.60
KI
0.60
çīĪ
0.60
HA
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.