INDEX
Explanations
events or actions that happened in the past
the presence of the word "Back."
New Auto-Interp
Negative Logits
inational
-0.68
viz
-0.63
orpor
-0.61
dioxide
-0.61
constitu
-0.61
risome
-0.60
×IJ
-0.60
izo
-0.59
wa
-0.58
tyr
-0.58
POSITIVE LOGITS
wards
1.07
GROUND
1.07
lash
1.05
dated
1.05
packs
1.01
stories
0.98
stab
0.98
door
0.98
tracking
0.96
ward
0.94
Activations Density 0.037%