INDEX
Explanations
past habitual actions or states
the phrase "used to" indicating past habits or states
New Auto-Interp
Negative Logits
edIn
-0.76
rising
-0.74
leaf
-0.74
oval
-0.73
imov
-0.69
agonists
-0.68
irez
-0.68
states
-0.67
IDA
-0.66
fixes
-0.65
POSITIVE LOGITS
joke
1.16
haunt
1.01
rely
1.01
be
1.00
tease
0.97
adore
0.95
operate
0.95
laugh
0.94
belong
0.94
dominate
0.94
Activations Density 0.027%