INDEX
Explanations
words that suggest the capability or ability of objects or concepts
New Auto-Interp
Negative Logits
n
-0.80
m
-0.77
ing
-0.77
9
-0.66
es
-0.64
th
-0.64
T
-0.63
T
-0.62
2
-0.62
<eos>
-0.61
POSITIVE LOGITS
izable
1.36
vable
1.29
urable
1.23
gable
1.21
chable
1.20
asable
1.17
able
1.15
Efq
1.14
Theſe
1.13
mountable
1.11
Activations Density 0.323%