INDEX
Explanations
verbs related to desires or intentions
New Auto-Interp
Negative Logits
faint
-0.63
mone
-0.61
suicides
-0.58
wheelchair
-0.58
mayors
-0.58
TTL
-0.58
attrition
-0.56
haz
-0.56
hero
-0.55
rid
-0.55
POSITIVE LOGITS
beforehand
0.80
ãĤ´
0.77
happening
0.74
_.
0.70
ETS
0.69
FontSize
0.68
?:
0.66
besides
0.65
regarding
0.65
advertisement
0.64
Activations Density 0.169%