INDEX
Explanations
phrases related to personal experiences and interactions
narratives and personal stories related to experiences and interactions with others
New Auto-Interp
Negative Logits
awaits
-0.80
joice
-0.71
await
-0.69
warns
-0.69
Parables
-0.66
wield
-0.65
foes
-0.65
urges
-0.64
welcomes
-0.63
recy
-0.63
POSITIVE LOGITS
didn
0.96
hindsight
0.94
[
0.92
['
0.91
Initially
0.90
went
0.90
Came
0.89
Eventually
0.88
was
0.87
Ended
0.86
Activations Density 0.633%