INDEX
Explanations
adjectives describing qualities or characteristics
expressions indicating certainty or assessment of condition
New Auto-Interp
Negative Logits
aste
-0.62
cot
-0.62
Sierra
-0.62
regate
-0.61
scl
-0.59
ries
-0.58
enment
-0.57
ussions
-0.56
rowth
-0.56
ourses
-0.56
POSITIVE LOGITS
hett
0.72
ðŁ
0.67
ãĥķãĤ©
0.66
understatement
0.64
cousins
0.63
aware
0.62
INO
0.61
GoldMagikarp
0.60
obser
0.59
Beir
0.58
Activations Density 0.267%