INDEX
Explanations
language indicating future expectations or possibilities
phrases expressing future expectations or predictions
New Auto-Interp
Negative Logits
sincerity
-0.66
PLIED
-0.63
Contributions
-0.61
rice
-0.61
volunt
-0.60
forms
-0.59
teammates
-0.59
personality
-0.59
charisma
-0.58
vanity
-0.58
POSITIVE LOGITS
witnessing
1.28
glimpse
1.07
witness
1.05
seeing
1.03
glimps
1.01
witnessed
0.98
hear
0.98
hearing
0.92
ourselves
0.92
see
0.90
Activations Density 0.310%