INDEX
Explanations
phrases or sentences asking for information or decisions
questions or phrases beginning with "what."
New Auto-Interp
Negative Logits
ulic
-0.75
enburg
-0.71
ubs
-0.69
robe
-0.69
ped
-0.67
ster
-0.67
enberg
-0.65
eah
-0.64
aches
-0.63
trop
-0.63
POSITIVE LOGITS
happened
1.23
happens
1.14
soever
1.14
kinds
1.07
sorts
1.07
happ
1.06
transpired
1.04
else
0.93
exactly
0.90
constitutes
0.88
Activations Density 0.104%