INDEX
Explanations
instances where there is an exception or deviation from the norm
phrases that indicate exceptions or deviations from the norm
New Auto-Interp
Negative Logits
horizont
-0.66
nesday
-0.65
Majesty
-0.63
Keys
-0.62
Coffee
-0.60
foot
-0.59
scrimmage
-0.58
anon
-0.58
canvas
-0.57
Sta
-0.57
POSITIVE LOGITS
behaved
0.80
worldly
0.75
heastern
0.73
inclined
0.69
omon
0.68
é¾įåĸļ士
0.67
NESS
0.66
ghazi
0.65
scill
0.64
OSED
0.63
Activations Density 0.019%