INDEX
Explanations
phrases indicating a method or course of action
phrases indicating methods or approaches to achieve specific outcomes
New Auto-Interp
Negative Logits
ĸļ
-0.95
usters
-0.81
asts
-0.72
riks
-0.70
encer
-0.69
anmar
-0.65
etheus
-0.64
uster
-0.64
oubted
-0.63
aredevil
-0.63
POSITIVE LOGITS
ward
0.89
finding
0.86
fare
0.84
point
0.80
allo
0.71
way
0.71
forward
0.70
forward
0.69
NE
0.69
bm
0.69
Activations Density 0.025%