INDEX
Explanations
phrases expressing strong enthusiasm or commitment towards a subject or activity
New Auto-Interp
Negative Logits
erness
-0.17
ázev
-0.16
ture
-0.15
warf
-0.15
udeau
-0.15
anson
-0.15
ependency
-0.15
tual
-0.14
ken
-0.14
wij
-0.14
POSITIVE LOGITS
Ramp
0.16
about
0.16
rh
0.15
About
0.14
amp
0.14
ouched
0.14
rouge
0.14
behalf
0.13
atic
0.13
ened
0.13
Activations Density 0.030%