INDEX
Explanations
quantitative comparisons or superlatives
terms related to magnitude, significance, or impact
New Auto-Interp
Negative Logits
din
-0.70
dfx
-0.66
mere
-0.63
agine
-0.62
ASAP
-0.61
andel
-0.60
deep
-0.58
something
-0.58
someday
-0.58
Moons
-0.58
POSITIVE LOGITS
overall
0.81
(âĪĴ
0.75
uptake
0.74
(-
0.73
than
0.73
ikawa
0.71
pronounced
0.70
luster
0.70
asca
0.65
disapprove
0.65
Activations Density 0.286%