INDEX
Explanations
phrases related to planned actions or strategies
articles and phrases indicating attempts or endeavors
New Auto-Interp
Negative Logits
izons
-0.76
omever
-0.72
alions
-0.71
Egyptians
-0.64
onents
-0.63
aughter
-0.61
ences
-0.61
ouls
-0.60
utters
-0.60
iciency
-0.59
POSITIVE LOGITS
bid
1.06
effort
0.99
nutshell
0.94
attempt
0.94
stakes
0.90
reversal
0.89
testament
0.85
unprecedented
0.84
manner
0.84
ploy
0.83
Activations Density 0.217%