INDEX
Explanations
terms associated with simplicity, basicness, or a lack of sophistication
New Auto-Interp
Negative Logits
еÑĩ
-0.16
agate
-0.14
thew
-0.14
رز
-0.13
ean
-0.13
aku
-0.13
ordova
-0.13
iams
-0.13
ings
-0.13
帯
-0.13
POSITIVE LOGITS
/simple
0.15
uder
0.15
/raw
0.15
caller
0.15
eti
0.14
/big
0.14
ترÛĮÙĨ
0.14
117
0.14
inte
0.14
/original
0.13
Activations Density 0.026%