INDEX
Explanations
significant actions or decisions and their implications
New Auto-Interp
Negative Logits
yl
-0.14
Tell
-0.14
Tells
-0.14
quette
-0.14
eder
-0.13
tle
-0.13
Changed
-0.13
emaakt
-0.13
Dit
-0.13
pf
-0.13
POSITIVE LOGITS
follows
0.44
comes
0.38
follow
0.35
Follow
0.34
Follow
0.32
follow
0.31
comes
0.30
.follow
0.30
Comes
0.29
FOLLOW
0.27
Activations Density 0.106%