INDEX
Explanations
words that indicate capability or suitability
New Auto-Interp
Negative Logits
ed
-0.28
ing
-0.26
edb
-0.22
arily
-0.22
ical
-0.20
emann
-0.17
naires
-0.17
fulness
-0.17
ese
-0.17
naire
-0.17
POSITIVE LOGITS
able
0.25
atable
0.25
enough
0.23
0.23
ble
0.20
/un
0.20
mente
0.20
ABLE
0.20
/non
0.20
/read
0.19
Activations Density 0.151%