INDEX
Explanations
phrases indicating personal experiences or feelings
the word "was" in various contexts
New Auto-Interp
Negative Logits
iosyncr
-0.73
worthiness
-0.68
ð
-0.68
izable
-0.67
Stability
-0.66
ottage
-0.65
士
-0.65
vernment
-0.65
exploits
-0.64
Extend
-0.64
POSITIVE LOGITS
wondering
1.31
fortunate
1.26
lucky
1.22
amazed
1.21
tempted
1.20
expecting
1.20
thinking
1.19
afraid
1.17
surprised
1.16
hoping
1.16
Activations Density 0.160%