INDEX
Explanations
references to different levels or scales of analysis in various contexts
New Auto-Interp
Negative Logits
itä
-0.17
ktop
-0.15
_native
-0.15
ekim
-0.14
umbn
-0.14
etty
-0.14
lest
-0.13
Ñģамое
-0.13
emento
-0.13
ALAR
-0.13
POSITIVE LOGITS
Wol
0.16
eus
0.15
_macro
0.15
atz
0.15
Bram
0.15
Peters
0.14
Booth
0.14
ιλο
0.14
wol
0.14
chos
0.14
Activations Density 0.577%