INDEX
Explanations
adjectives and terms related to ability and usability
New Auto-Interp
Negative Logits
ed
-0.27
ing
-0.27
arily
-0.23
edb
-0.20
ical
-0.20
ically
-0.18
emann
-0.18
eday
-0.17
ede
-0.17
fulness
-0.17
POSITIVE LOGITS
0.23
atable
0.20
able
0.20
/edit
0.19
/read
0.19
-bodied
0.18
mente
0.17
/un
0.17
/non
0.17
/use
0.17
Activations Density 0.147%