INDEX
Explanations
phrases indicating concern or consideration about a situation
New Auto-Interp
Negative Logits
ulous
-0.16
339
-0.15
licens
-0.15
ling
-0.15
ling
-0.14
ůl
-0.13
ogan
-0.13
شتر
-0.13
337
-0.13
ä
-0.13
POSITIVE LOGITS
iets
0.16
елен
0.15
gings
0.15
acker
0.15
cÃł
0.15
ampie
0.15
ackers
0.14
piler
0.14
CRY
0.14
akah
0.14
Activations Density 0.074%