INDEX
Explanations
language-related terms
references to language and its various forms and uses
New Auto-Interp
Negative Logits
ilon
-0.85
roxy
-0.84
ilts
-0.83
apego
-0.80
uds
-0.80
agher
-0.78
ECD
-0.77
hap
-0.76
rodu
-0.75
uden
-0.75
POSITIVE LOGITS
language
0.96
learners
0.94
spoken
0.92
language
0.92
interpreter
0.85
anguage
0.83
immersion
0.81
lear
0.79
instruction
0.75
barrier
0.75
Activations Density 0.016%