INDEX
Explanations
questions about events and consequences
New Auto-Interp
Negative Logits
ament
-1.25
tesy
-1.25
cius
-1.21
mens
-1.17
staff
-1.16
bors
-1.13
overed
-1.11
recy
-1.07
rehend
-1.07
ways
-1.06
POSITIVE LOGITS
uate
1.28
unfold
1.27
happen
1.19
unfolding
1.19
uates
1.14
happening
1.07
havoc
1.03
uating
0.99
transpired
0.97
happens
0.97
Activations Density 0.359%