INDEX
Explanations
phrases related to personal experience and opinion.
New Auto-Interp
Negative Logits
Nora
-0.56
Uriel
-0.53
Dickinson
-0.51
DeVos
-0.51
goodbye
-0.51
abdom
-0.50
Gale
-0.50
heres
-0.49
edient
-0.49
Mats
-0.47
POSITIVE LOGITS
'm
1.01
've
0.88
am
0.77
RL
0.74
pec
0.73
suppose
0.73
wish
0.72
myself
0.72
UC
0.70
stad
0.70
Activations Density 11.214%