INDEX
Explanations
instances of emotional or subjective evaluations about situations or experiences
New Auto-Interp
Negative Logits
aida
-0.18
isc
-0.15
croft
-0.14
anja
-0.14
edl
-0.14
olley
-0.14
reportedly
-0.14
Holl
-0.14
ï¸
-0.14
enic
-0.14
POSITIVE LOGITS
rog
0.17
endless
0.15
-random
0.15
aravel
0.14
ATUS
0.14
onen
0.14
quite
0.14
olor
0.14
lopen
0.14
lem
0.14
Activations Density 0.100%