INDEX
Explanations
offers of further explanation
New Auto-Interp
Negative Logits
**:
0.66
**,
0.65
:**
0.63
*,
0.62
...",
0.61
*:
0.60
:",
0.58
:
0.55
…,
0.54
”:
0.54
POSITIVE LOGITS
Cheers
0.77
Hope
0.74
Thanks
0.69
saludos
0.68
hope
0.67
chevron
0.66
Wasch
0.65
Danke
0.65
awcy
0.64
Shame
0.64
Activations Density 0.245%