INDEX
Explanations
proper nouns related to a specific entity or concept
the mention of a specific term related to a cultural or social concept
New Auto-Interp
Negative Logits
almonds
-0.71
withholding
-0.66
terday
-0.63
IVES
-0.61
meal
-0.61
behold
-0.61
Highlander
-0.61
¿½
-0.60
preferential
-0.60
braces
-0.60
POSITIVE LOGITS
ultane
1.60
psons
1.36
iliar
1.36
ulators
1.34
ulations
1.32
ply
1.26
pler
1.25
ulator
1.22
pson
1.21
ulation
1.19
Activations Density 0.034%