INDEX
Explanations
conditional statements starting with "Had"
phrases that include the word "Had" indicating hypothetical scenarios or conditions
New Auto-Interp
Negative Logits
outp
-0.64
neigh
-0.64
FTWARE
-0.62
repay
-0.60
juggling
-0.59
ilings
-0.59
shake
-0.59
shedding
-0.59
dish
-0.59
scrimmage
-0.58
POSITIVE LOGITS
iths
1.05
been
1.00
luck
0.95
ith
0.93
kson
0.92
been
0.91
rons
0.89
hers
0.88
oop
0.86
ron
0.86
Activations Density 0.102%