INDEX
Explanations
phrases indicating newcomers or beginners
New Auto-Interp
Negative Logits
nearest
-0.15
mau
-0.14
ogue
-0.14
eso
-0.14
Majority
-0.14
itto
-0.14
ours
-0.13
nearest
-0.13
åıĶ
-0.13
stad
-0.13
POSITIVE LOGITS
ucher
0.16
098
0.16
oje
0.15
sy
0.15
fait
0.15
Roose
0.15
olly
0.15
erdem
0.14
erals
0.14
erif
0.14
Activations Density 0.009%