INDEX
Explanations
phrases related to starting or initiating an action
verbs indicating the initiation or continuation of actions
New Auto-Interp
Negative Logits
Surprise
-0.69
compliment
-0.66
contin
-0.64
fortunately
-0.64
continuation
-0.63
astern
-0.62
prem
-0.61
sun
-0.60
fore
-0.60
lat
-0.58
POSITIVE LOGITS
anew
0.74
ories
0.70
ORK
0.68
edIn
0.67
ixel
0.66
phas
0.65
hao
0.63
circuits
0.61
daq
0.61
ury
0.60
Activations Density 0.153%