INDEX
Explanations
instances of repetition or similarity in phrases or descriptions
words associated with correctness or appropriateness in various contexts
New Auto-Interp
Negative Logits
ļéĨĴ
-0.81
antam
-0.73
ntil
-0.69
ADRA
-0.68
ibal
-0.68
quished
-0.67
ij
-0.64
â̦â̦â̦â̦â̦â̦â̦â̦
-0.63
Introdu
-0.63
Minimum
-0.62
POSITIVE LOGITS
money
0.75
smells
0.70
messenger
0.70
manner
0.69
livelihood
0.68
timetable
0.68
histories
0.68
manners
0.68
colors
0.67
geography
0.66
Activations Density 0.717%