INDEX
Explanations
phrases expressing strong feelings or significant experiences
New Auto-Interp
Negative Logits
ucci
-0.16
qid
-0.16
Alman
-0.15
895
-0.15
isku
-0.14
orsch
-0.14
maj
-0.13
263
-0.13
mam
-0.13
ordes
-0.13
POSITIVE LOGITS
agle
0.14
uste
0.14
imd
0.14
embed
0.14
alright
0.14
Specification
0.13
ëĨ
0.13
-tabs
0.13
etail
0.13
eras
0.13
Activations Density 0.140%