INDEX
Explanations
verbs related to actions or decisions being taken
New Auto-Interp
Negative Logits
Cong
-0.71
accompanies
-0.70
ese
-0.67
dissatisf
-0.64
livest
-0.62
bie
-0.61
cape
-0.59
Friend
-0.59
Smile
-0.58
Bey
-0.58
POSITIVE LOGITS
advantage
1.23
aways
1.08
heed
0.97
precautions
0.95
aback
0.95
care
0.92
refuge
0.89
responsibility
0.86
precaution
0.84
aim
0.83
Activations Density 0.099%