INDEX
Explanations
phrases related to actions or decisions
instances of the word "taken."
New Auto-Interp
Negative Logits
ulate
-0.62
raft
-0.61
liction
-0.58
rose
-0.58
lin
-0.56
rous
-0.56
ense
-0.56
lier
-0.56
saw
-0.56
ove
-0.55
POSITIVE LOGITS
taken
3.37
Taken
2.20
eaten
1.60
flown
1.57
undertaken
1.52
gone
1.37
borne
1.35
seized
1.35
done
1.30
thrown
1.30
Activations Density 0.029%