INDEX
Explanations
phrases indicating completion or accomplishment
instances of the word "done" in various contexts
New Auto-Interp
Negative Logits
lights
-0.73
ulates
-0.73
olson
-0.71
azines
-0.68
ÏĦ
-0.68
anmar
-0.65
ulative
-0.65
eros
-0.63
Belief
-0.63
mare
-0.63
POSITIVE LOGITS
pez
1.27
ggie
0.78
chy
0.77
differently
0.75
administr
0.71
omething
0.70
brisk
0.70
away
0.69
ername
0.68
poorly
0.66
Activations Density 0.038%