INDEX
Explanations
phrases related to capability or ability
New Auto-Interp
Negative Logits
ness
-0.20
emente
-0.19
NESS
-0.17
rina
-0.16
acs
-0.15
iness
-0.15
enate
-0.15
baugh
-0.15
liness
-0.15
ocks
-0.15
POSITIVE LOGITS
to
0.25
uable
0.19
ted
0.19
able
0.18
åΰçļĦ
0.18
ble
0.18
pped
0.17
Able
0.17
bled
0.17
azed
0.16
Activations Density 0.051%