INDEX
Explanations
expressions of desire or disappointment regarding expectations
New Auto-Interp
Negative Logits
itz
-0.16
HL
-0.15
Wind
-0.14
Winds
-0.14
風
-0.14
urg
-0.13
EG
-0.13
PWD
-0.13
azine
-0.13
wind
-0.13
POSITIVE LOGITS
åľº
0.17
oord
0.15
scenario
0.14
Jong
0.14
cen
0.14
TI
0.14
wil
0.14
spm
0.14
noxious
0.14
.ur
0.14
Activations Density 0.198%