INDEX
Explanations
questions or statements requesting information or feedback
the word "what" and related inquiries
New Auto-Interp
Negative Logits
robe
-0.71
ulic
-0.69
raction
-0.69
ped
-0.66
stead
-0.66
ster
-0.65
ubs
-0.65
por
-0.64
enburg
-0.63
eer
-0.63
POSITIVE LOGITS
happened
1.13
soever
1.13
happens
1.09
happ
1.08
kinds
1.07
sorts
1.06
transpired
0.89
redes
0.88
kind
0.86
else
0.86
Activations Density 0.120%