INDEX
Explanations
prominent phrases relating to specific contexts or topics, without a consistent overarching theme
phrases and words related to engagement and positivity
New Auto-Interp
Negative Logits
``
-1.51
�
-1.37
``
-1.01
''
-0.98
�
-0.95
_
-0.91
``(
-0.89
----------------------------------------------------------------
-0.89
.--
-0.89
.''.
-0.81
POSITIVE LOGITS
â̦
2.78
â̦.
2.58
â̦
2.50
â̦)
2.43
â̦]
2.32
â̦..
2.27
â̦"
2.24
"â̦
2.20
[â̦]
2.19
â̦."
2.08
Activations Density 0.176%