INDEX
Explanations
phrases related to personal identity and self-discovery
connections and relationships in personal narratives
New Auto-Interp
Negative Logits
rique
-0.95
rers
-0.80
visors
-0.71
onde
-0.69
tyard
-0.68
asio
-0.67
§
-0.67
reopened
-0.65
osures
-0.65
fronts
-0.65
POSITIVE LOGITS
viz
0.82
Magikarp
0.82
namely
0.79
whether
0.77
THEN
0.75
etc
0.74
determines
0.74
subtract
0.71
BUT
0.68
Tokens
0.68
Activations Density 0.449%