INDEX
Explanations
phrases that indicate a change or guidance in movement or strategy
New Auto-Interp
Negative Logits
esters
-0.72
aqu
-0.68
itted
-0.66
oak
-0.66
asts
-0.65
Byrne
-0.63
enos
-0.63
ammy
-0.62
Mini
-0.62
tein
-0.61
POSITIVE LOGITS
direction
1.29
ality
1.15
directions
1.07
finding
0.93
ward
0.92
ally
0.90
finder
0.87
towards
0.86
toward
0.85
ational
0.84
Activations Density 0.011%