INDEX
Explanations
unusual occurrences or deviations from normal patterns
the adverb "normally" indicating typical behavior or conditions
New Auto-Interp
Negative Logits
Kut
-0.80
Ing
-0.79
ged
-0.70
Yose
-0.66
populism
-0.65
Muse
-0.65
gets
-0.65
Crusade
-0.64
addons
-0.63
Nuggets
-0.63
POSITIVE LOGITS
llular
0.94
behaved
0.83
inclined
0.77
disclaim
0.77
partName
0.75
encountered
0.75
adal
0.73
reacted
0.72
hattan
0.72
speaking
0.71
Activations Density 0.013%