INDEX
Explanations
texts instructing someone to provide information or share details
requests for information or stories
New Auto-Interp
Negative Logits
urdue
-0.80
ILCS
-0.72
rane
-0.69
cdn
-0.66
namese
-0.64
zinski
-0.64
nam
-0.62
elimination
-0.60
JV
-0.60
hered
-0.59
POSITIVE LOGITS
tale
1.63
ingly
1.19
us
0.90
tell
0.86
tales
0.81
tale
0.81
biz
0.78
me
0.78
ously
0.75
tell
0.75
Activations Density 0.056%