INDEX
Explanations
phrases related to communication, specifically the act of telling or informing someone about something
New Auto-Interp
Negative Logits
adesh
-0.82
imposed
-0.73
activity
-0.71
sidx
-0.70
everal
-0.69
berus
-0.69
served
-0.69
conservancy
-0.67
cells
-0.66
aband
-0.65
POSITIVE LOGITS
tale
1.43
lies
1.12
stories
1.09
ingly
1.08
jokes
1.04
tales
1.03
us
1.02
Tale
0.89
tale
0.88
Lies
0.87
Activations Density 0.050%