INDEX
Explanations
neutral descriptive terms or statements
New Auto-Interp
Negative Logits
streng
-0.67
avorite
-0.67
wana
-0.65
idden
-0.62
habi
-0.61
ascus
-0.61
airs
-0.61
iverpool
-0.61
sung
-0.61
asca
-0.61
POSITIVE LOGITS
uality
1.11
ually
1.05
orial
1.04
ional
1.03
fulness
0.78
finder
0.78
oids
0.76
ual
0.76
liest
0.75
uation
0.74
Activations Density 0.027%