INDEX
Explanations
phrases indicating actions or states of being
auxiliary verbs indicating actions or states
New Auto-Interp
Negative Logits
QB
-0.79
WAR
-0.65
amel
-0.64
incorpor
-0.62
imoto
-0.62
direction
-0.61
ops
-0.60
otide
-0.59
ibaba
-0.58
imer
-0.58
POSITIVE LOGITS
Geh
0.72
Sanct
0.70
Nost
0.68
initions
0.66
Kern
0.64
vation
0.64
gh
0.60
doi
0.60
ndra
0.59
ours
0.59
Activations Density 0.051%