INDEX
Explanations
mentions of specific actions or situations
instances of listening or hearing
New Auto-Interp
Negative Logits
never
-0.70
;;;;;;;;;;;;
-0.70
;;;;;;;;
-0.69
nonetheless
-0.69
redes
-0.65
although
-0.64
complied
-0.64
etheless
-0.63
remained
-0.62
nevertheless
-0.62
POSITIVE LOGITS
somebody
0.72
isEnabled
0.71
someone
0.69
pires
0.60
someone
0.58
adversity
0.56
something
0.56
disrespect
0.55
context
0.54
new
0.54
Activations Density 0.352%