INDEX
Explanations
phrases indicating exclusivity or singling out
New Auto-Interp
Negative Logits
insula
-0.73
ahime
-0.71
heterogeneity
-0.69
illin
-0.65
ulty
-0.64
idon
-0.63
arted
-0.63
raught
-0.62
charism
-0.61
anon
-0.61
POSITIVE LOGITS
marginally
1.10
ICES
0.93
ices
0.92
kidding
0.86
ĨĴ
0.86
incidentally
0.83
spor
0.81
£ı
0.79
briefly
0.79
mildly
0.75
Activations Density 6.006%