INDEX
Explanations
terms related to dictionaries
references to dictionaries and definitions
New Auto-Interp
Negative Logits
ments
-0.86
mented
-0.77
arnaev
-0.71
vertising
-0.70
urion
-0.70
Antar
-0.69
meat
-0.68
cedented
-0.67
warm
-0.66
anced
-0.64
POSITIVE LOGITS
Dictionary
1.31
dictionary
1.07
definitions
0.97
textbook
0.94
initions
0.91
Encyclopedia
0.91
pedia
0.90
Britann
0.89
ictionary
0.85
reference
0.84
Activations Density 0.088%