INDEX
Explanations
phrases indicating the potential for improvement or contribution
New Auto-Interp
Negative Logits
umption
-0.76
coe
-0.68
ios
-0.65
leys
-0.65
gravity
-0.64
hower
-0.64
gey
-0.62
obar
-0.62
oak
-0.62
��
-0.62
POSITIVE LOGITS
answer
0.72
afraid
0.67
IFIED
0.60
ashamed
0.58
STATE
0.58
offer
0.58
_-
0.55
Angelo
0.55
ifies
0.54
learnt
0.54
Activations Density 0.036%