INDEX
Explanations
locations or distances expressed in units
quantitative measurements and comparisons
New Auto-Interp
Negative Logits
eret
-0.79
OOL
-0.67
PLA
-0.65
hib
-0.64
inburgh
-0.64
rums
-0.64
LIB
-0.62
udeb
-0.62
ivities
-0.61
itual
-0.61
POSITIVE LOGITS
apiece
1.07
increments
0.75
worth
0.74
fen
0.74
squared
0.73
per
0.73
shy
0.72
lihood
0.71
ago
0.71
advance
0.68
Activations Density 0.239%