INDEX
Explanations
phrases indicating high agreement or significant proportions
phrases indicating proportions or statistical data
New Auto-Interp
Negative Logits
tyr
-0.87
compr
-0.60
Shad
-0.57
Reconstruction
-0.55
McAuliffe
-0.55
Trailer
-0.54
ADRA
-0.53
redes
-0.53
arsen
-0.53
Nadu
-0.52
POSITIVE LOGITS
of
1.12
of
0.97
ta
0.96
Of
0.93
paced
0.86
fitted
0.86
stri
0.84
ranked
0.84
Of
0.81
scoring
0.81
Activations Density 0.032%