INDEX
Explanations
words and phrases that convey strength or intensity
New Auto-Interp
Negative Logits
ized
-0.19
aled
-0.16
ë¡ľ
-0.16
ION
-0.16
bian
-0.15
ollapsed
-0.15
led
-0.15
jur
-0.15
šk
-0.15
ohn
-0.15
POSITIVE LOGITS
holds
0.30
mẽ
0.27
-strong
0.24
(er
0.23
bow
0.22
,strong
0.21
/we
0.21
çĥĪ
0.20
strong
0.19
sville
0.19
Activations Density 0.036%