INDEX
Explanations
modal verbs and expressions of probability or expectation
New Auto-Interp
Negative Logits
aph
-0.15
axter
-0.15
Ïĥμ
-0.14
iasm
-0.14
ipi
-0.14
elson
-0.14
iou
-0.14
коп
-0.14
adar
-0.14
ception
-0.14
POSITIVE LOGITS
illas
0.19
possibly
0.17
ds
0.17
Poss
0.16
hart
0.16
end
0.16
cert
0.16
swims
0.15
orig
0.15
function
0.15
Activations Density 0.049%