INDEX
Explanations
high-frequency function words or common linguistic structures
New Auto-Interp
Negative Logits
abase
-0.17
izzo
-0.17
rvé
-0.16
ãĥ«ãĤ¯
-0.15
imp
-0.15
оже
-0.14
Bern
-0.14
Rosenstein
-0.14
irst
-0.14
agraph
-0.14
POSITIVE LOGITS
ija
0.17
dorf
0.15
uros
0.15
lers
0.15
ler
0.15
йн
0.15
detect
0.14
endor
0.14
ario
0.14
alis
0.14
Activations Density 0.000%