INDEX
Explanations
instances of personal narratives and experiences
New Auto-Interp
Negative Logits
risen
-0.22
ãĤīãĤĮãģ¦ãģĦãĤĭ
-0.20
bitten
-0.20
blown
-0.19
пода
-0.19
fallen
-0.18
trailed
-0.18
ridden
-0.18
flown
-0.18
ãģ£ãģ¦ãģĦãĤĭ
-0.18
POSITIVE LOGITS
was
0.39
did
0.39
took
0.36
gave
0.35
didn
0.32
went
0.32
became
0.31
came
0.29
was
0.29
didnt
0.29
Activations Density 3.084%