INDEX
Explanations
phrases indicating relationships and comparisons
New Auto-Interp
Negative Logits
rap
-0.06
rus
-0.06
.
-0.06
ilk
-0.05
jong
-0.05
Ế
-0.05
rada
-0.05
hurst
-0.05
377
-0.05
Utilities
-0.05
POSITIVE LOGITS
sense
0.19
strict
0.17
Sense
0.16
sense
0.16
Sense
0.15
sentido
0.15
strictly
0.14
meaning
0.14
broad
0.14
literal
0.14
Activations Density 0.032%