INDEX
Explanations
words related to communication, specifically involving telling or explaining something to others
expressions of communication or assertions made by the speaker
New Auto-Interp
Negative Logits
idges
-0.70
eele
-0.67
metadata
-0.67
ONSORED
-0.66
scope
-0.65
ynski
-0.64
thinkable
-0.62
atus
-0.62
helps
-0.62
artifacts
-0.60
POSITIVE LOGITS
myself
1.86
my
1.20
him
1.18
ourselves
1.12
them
1.07
everybody
1.01
anybody
1.01
somebody
0.98
everyone
0.95
THEM
0.93
Activations Density 0.453%