INDEX
Explanations
phrases related to acquiring knowledge or education
New Auto-Interp
Negative Logits
AW
-0.16
e
-0.15
/to
-0.15
och
-0.14
upro
-0.14
SM
-0.14
libertine
-0.13
AWN
-0.13
allon
-0.13
luk
-0.13
POSITIVE LOGITS
about
0.19
Gover
0.17
depos
0.15
skirts
0.15
’ta
0.15
ldb
0.15
ntax
0.15
OffsetTable
0.15
how
0.15
rys
0.15
Activations Density 0.033%