INDEX
Explanations
references to individuals involved in various actions and circumstances
New Auto-Interp
Negative Logits
wonder
-0.15
orda
-0.15
wondered
-0.14
ãĥªãĥ¼
-0.14
UA
-0.14
eya
-0.14
gone
-0.14
terdam
-0.14
Wonder
-0.14
wonders
-0.13
POSITIVE LOGITS
ark
0.26
aves
0.21
term
0.18
helped
0.18
partially
0.17
terms
0.17
partly
0.17
ajud
0.17
later
0.17
inherited
0.17
Activations Density 0.147%