INDEX
Explanations
phrases that express generalizations or broad statements
New Auto-Interp
Negative Logits
orry
-0.16
ender
-0.16
itos
-0.16
antlr
-0.15
iti
-0.15
assa
-0.14
itur
-0.14
itespace
-0.14
chg
-0.14
275
-0.13
POSITIVE LOGITS
Ïĥη
0.17
éģ
0.16
<context
0.15
-speaking
0.15
generally
0.15
Guy
0.14
-ie
0.14
æĹ
0.14
lez
0.14
/general
0.14
Activations Density 0.074%