INDEX
Explanations
numerical ratings or evaluations
New Auto-Interp
Negative Logits
capitals
-0.73
advis
-0.70
roam
-0.67
stewards
-0.67
budgets
-0.66
ussion
-0.65
etheless
-0.65
leap
-0.64
purse
-0.63
dotted
-0.63
POSITIVE LOGITS
J
1.03
E
0.97
O
0.95
A
0.91
Y
0.90
Va
0.86
C
0.85
K
0.84
L
0.83
D
0.83
Activations Density 0.031%