INDEX
Explanations
phrases related to personal reflections and experiences
expressions of personal emotional experiences and reflections
New Auto-Interp
Negative Logits
£ı
-0.69
ardless
-0.67
overseen
-0.66
æ©Ł
-0.66
arez
-0.63
duly
-0.63
championed
-0.62
ivably
-0.61
vernment
-0.60
stewards
-0.60
POSITIVE LOGITS
me
1.63
my
1.16
him
1.03
him
0.92
us
0.89
mine
0.83
My
0.81
waking
0.80
me
0.80
nerve
0.77
Activations Density 0.333%