INDEX
Explanations
words related to definitions or explanations
phrases that define various concepts and terms
New Auto-Interp
Negative Logits
outweigh
-0.83
umbn
-0.73
heels
-0.72
BLIC
-0.72
aldi
-0.70
ibling
-0.69
ersen
-0.69
sync
-0.68
applaud
-0.67
ÃĥÃĤ
-0.64
POSITIVE LOGITS
CoC
0.80
boundaries
0.79
thresholds
0.72
defining
0.71
initions
0.71
Characters
0.70
meaning
0.69
definitions
0.68
Species
0.68
Category
0.67
Activations Density 0.164%