INDEX
Explanations
sequences of letters followed by a number
references to specific entities or names
New Auto-Interp
Negative Logits
dere
-0.66
Idlib
-0.63
impunity
-0.63
Homo
-0.61
Pharaoh
-0.60
Gutenberg
-0.57
laughing
-0.56
face
-0.56
foil
-0.56
departure
-0.55
POSITIVE LOGITS
odies
1.22
ishop
1.22
ranch
1.20
ureau
1.16
ODY
1.15
udget
1.14
rief
1.13
ridges
1.11
irds
1.11
ooth
1.11
Activations Density 0.044%