INDEX
Explanations
phrases related to abstract concepts such as principles and structures
concepts related to physical attributes and characteristics
New Auto-Interp
Negative Logits
suspended
-0.60
Badge
-0.58
corrid
-0.58
iann
-0.56
Jagu
-0.55
pard
-0.55
Panthers
-0.55
digitally
-0.54
approved
-0.54
Jem
-0.53
POSITIVE LOGITS
Reviewer
0.76
but
0.71
proportions
0.69
sake
0.68
comparisons
0.68
insofar
0.66
yet
0.66
arist
0.65
smanship
0.64
due
0.63
Activations Density 0.429%