INDEX
Explanations
numeric figures or data descriptions
New Auto-Interp
Negative Logits
ibaba
-0.74
cyclopedia
-0.70
00000
-0.68
artney
-0.67
izabeth
-0.66
emies
-0.65
VICE
-0.64
ription
-0.64
lopp
-0.64
Sussex
-0.63
POSITIVE LOGITS
skating
0.99
prominently
0.94
heads
0.92
head
0.84
book
0.73
table
0.71
hatt
0.71
Rasmussen
0.69
sonian
0.68
inery
0.68
Activations Density 0.726%