INDEX
Explanations
expressions of subjective assessments regarding behavior, situations, or conditions
New Auto-Interp
Negative Logits
683
-0.15
acle
-0.15
pare
-0.15
æ±Ĺ
-0.14
itten
-0.14
esian
-0.14
hape
-0.14
.ide
-0.13
rink
-0.13
æı
-0.13
POSITIVE LOGITS
sense
0.44
degree
0.40
sense
0.36
degree
0.31
Sense
0.31
Degree
0.28
Sense
0.27
lack
0.26
measure
0.26
level
0.24
Activations Density 0.238%