INDEX
Explanations
phrases related to actions that involve influence, control, or response
words indicating actions or intentions
New Auto-Interp
Negative Logits
abiding
-0.85
afety
-0.75
answered
-0.67
76561
-0.66
ById
-0.66
apologised
-0.66
checking
-0.64
Died
-0.63
cens
-0.63
breeding
-0.63
POSITIVE LOGITS
elevate
0.91
begin
0.91
accelerate
0.91
unleash
0.89
extend
0.88
bring
0.85
widen
0.85
seize
0.83
expand
0.83
plunge
0.83
Activations Density 0.490%