INDEX
Explanations
descriptions related to characteristics or qualities
words and phrases related to qualitative descriptions of experiences and emotions
New Auto-Interp
Negative Logits
ornings
-0.58
ificantly
-0.56
Lago
-0.53
inion
-0.53
enture
-0.52
orthy
-0.52
Specific
-0.52
Shades
-0.50
Values
-0.49
unrelated
-0.49
POSITIVE LOGITS
iest
0.75
liest
0.71
hest
0.59
fraternity
0.58
erest
0.56
osphere
0.55
portion
0.55
menace
0.55
fallacy
0.53
plag
0.53
Activations Density 1.171%