INDEX
Explanations
verbs related to actions
occurrences of the phrase "doing" in various contexts
New Auto-Interp
Negative Logits
lights
-0.69
sg
-0.66
wit
-0.64
antha
-0.62
pu
-0.61
untu
-0.61
ules
-0.61
liner
-0.61
dt
-0.61
coe
-0.59
POSITIVE LOGITS
berman
0.76
nothing
0.74
brisk
0.73
女
0.73
something
0.70
terribly
0.70
THING
0.70
things
0.68
LE
0.67
alright
0.67
Activations Density 0.050%