INDEX
Explanations
quantitative measurements or characteristics
phrases that highlight the significance or importance of a subject
New Auto-Interp
Negative Logits
idth
-0.70
berra
-0.64
hello
-0.62
Bradley
-0.61
Carlton
-0.60
Seah
-0.60
hoop
-0.59
ruction
-0.58
inav
-0.58
hoops
-0.57
POSITIVE LOGITS
unsu
1.03
unus
0.96
susceptible
0.89
uniquely
0.88
unlikely
0.88
ineligible
0.85
unfit
0.82
eligible
0.80
liable
0.80
safer
0.77
Activations Density 0.096%