INDEX
Explanations
phrases related to actions or behaviors
phrases that indicate actions or roles played by subjects, particularly in the context of legal or moral implications
New Auto-Interp
Negative Logits
Cum
-0.70
Topics
-0.68
Gloss
-0.66
Thrust
-0.64
arnaev
-0.63
Shards
-0.63
Celebr
-0.60
topics
-0.60
ribes
-0.60
Recipes
-0.59
POSITIVE LOGITS
opposite
0.77
differently
0.76
behalf
0.72
unilaterally
0.69
RECT
0.68
uously
0.68
ively
0.67
veto
0.67
instinct
0.66
inverse
0.65
Activations Density 0.105%