INDEX
Explanations
proper names, particularly those of researchers and contributors in scientific publications
New Auto-Interp
Negative Logits
otte
-0.15
rost
-0.15
dera
-0.15
awl
-0.14
etty
-0.14
raft
-0.14
arian
-0.13
tarz
-0.13
кал
-0.13
ophobia
-0.13
POSITIVE LOGITS
conc
0.15
uede
0.15
ves
0.15
orean
0.14
ahoo
0.14
ij
0.14
ta
0.14
lesson
0.14
æĭ³
0.14
Thi
0.14
Activations Density 0.292%