INDEX
Explanations
positive adjectives describing qualities and characteristics
New Auto-Interp
Negative Logits
sdale
-0.15
recision
-0.14
roker
-0.14
.vn
-0.14
forman
-0.13
æŀģ
-0.13
ismu
-0.13
multiline
-0.13
isex
-0.13
omination
-0.13
POSITIVE LOGITS
amount
0.30
sense
0.26
understanding
0.25
grasp
0.25
following
0.25
amount
0.25
degree
0.24
handle
0.24
level
0.23
Amount
0.22
Activations Density 0.162%