INDEX
Explanations
words related to personal experiences and challenges
expressions related to critical thinking and social awareness
New Auto-Interp
Negative Logits
barring
-0.67
theirs
-0.66
except
-0.65
amera
-0.64
ARE
-0.62
warning
-0.62
alleging
-0.62
urging
-0.60
supplying
-0.60
uploads
-0.59
POSITIVE LOGITS
oneself
1.34
yourself
1.02
Yourself
0.93
azeera
0.74
myself
0.72
arenthood
0.71
shitty
0.70
hindsight
0.66
entails
0.65
mates
0.63
Activations Density 0.644%