INDEX
Explanations
the word "ob"
the presence of the word "ob"
New Auto-Interp
Negative Logits
ivities
-0.85
imates
-0.74
yards
-0.69
cort
-0.67
Orient
-0.66
³³³³³³³³
-0.65
Angelo
-0.62
ORIG
-0.62
Painter
-0.62
Divinity
-0.62
POSITIVE LOGITS
acter
1.22
server
1.18
edient
1.17
lique
1.14
edience
1.14
acco
1.12
rien
1.09
bing
1.08
livion
1.07
fusc
1.06
Activations Density 0.032%