INDEX
Explanations
action-oriented phrases or statements containing the word "always."
phrases that indicate ongoing actions or continual states
New Auto-Interp
Negative Logits
bia
-0.62
·
-0.61
ettlement
-0.61
ģ«
-0.59
ALLY
-0.59
VICE
-0.58
presently
-0.58
Yang
-0.58
dm
-0.57
NAME
-0.57
POSITIVE LOGITS
ailable
0.73
ministic
0.70
ounters
0.70
andro
0.67
oin
0.67
pursu
0.67
rences
0.66
enos
0.65
dule
0.65
theless
0.64
Activations Density 0.181%