INDEX
Explanations
uncertainty or speculation about potential future outcomes based on current events or actions
conditional and hypothetical phrases about possibilities and outcomes
New Auto-Interp
Negative Logits
onz
-0.69
accepting
-0.65
citing
-0.61
iard
-0.60
lication
-0.57
waging
-0.56
rel
-0.56
Maid
-0.56
vati
-0.55
pard
-0.54
POSITIVE LOGITS
happen
1.56
happened
1.24
transpired
1.20
happens
1.13
happ
1.08
occur
1.06
motivate
1.02
Happ
1.00
constitute
0.90
await
0.84
Activations Density 0.095%