INDEX
Explanations
adjectives that convey significance, intensity, or danger
New Auto-Interp
Negative Logits
ér
-0.16
zt
-0.16
aro
-0.15
enza
-0.14
ères
-0.14
anske
-0.14
://{-0.14
imid
-0.13
athers
-0.13
azel
-0.13
POSITIVE LOGITS
imaginable
0.26
of
0.25
yet
0.24
-ever
0.22
yet
0.22
possible
0.21
possible
0.21
aspects
0.21
Yet
0.20
aspect
0.20
Activations Density 0.078%