INDEX
Explanations
references to classroom environments and activities
New Auto-Interp
Negative Logits
kola
-0.19
uder
-0.17
ucken
-0.16
udi
-0.14
uddy
-0.14
oo
-0.14
ÑģÑĬ
-0.14
uary
-0.14
Rig
-0.14
udem
-0.14
POSITIVE LOGITS
/lab
0.16
otle
0.15
дÑĸ
0.15
abcdefgh
0.14
iž
0.14
ardin
0.14
Bravo
0.13
ettle
0.13
lar
0.13
ónico
0.13
Activations Density 0.024%