INDEX
Explanations
mentions of speaking or expressing opinions
New Auto-Interp
Negative Logits
aml
-0.16
ated
-0.15
om
-0.15
uv
-0.15
ment
-0.15
roit
-0.15
uw
-0.14
reet
-0.14
fully
-0.14
Äħ
-0.14
POSITIVE LOGITS
volumes
0.27
fluent
0.18
ertest
0.17
Volumes
0.15
spe
0.15
volume
0.15
engagements
0.15
ланд
0.15
olumes
0.14
louder
0.14
Activations Density 0.028%