INDEX
Explanations
words indicating measurements or quantities
New Auto-Interp
Negative Logits
tees
-0.18
ingly
-0.17
esz
-0.16
itious
-0.16
اÛĮÙĩ
-0.15
ties
-0.15
esine
-0.15
432
-0.15
259
-0.14
íĭ±
-0.14
POSITIVE LOGITS
erva
0.16
ney
0.16
ning
0.16
net
0.15
ner
0.15
_attached
0.14
eral
0.14
Ģ
0.14
ninger
0.14
nie
0.14
Activations Density 0.049%