INDEX
Explanations
note, followed by characteristics
New Auto-Interp
Negative Logits
throated
-0.79
шит
-0.78
lille
-0.76
Kini
-0.74
sexto
-0.73
приятия
-0.71
腩
-0.71
ritter
-0.69
uren
-0.69
英語
-0.68
POSITIVE LOGITS
Note
3.22
Note
3.05
note
3.03
note
2.48
notes
2.44
Notes
2.42
NOTE
2.38
NOTE
2.38
Notes
2.20
NOTES
1.81
Activations Density 0.014%