INDEX
Explanations
phrases that express initial reactions or feelings about situations
New Auto-Interp
Negative Logits
eland
-0.15
either
-0.14
c
-0.14
åĥį
-0.14
today
-0.14
o
-0.14
rarely
-0.14
now
-0.13
eri
-0.13
alent
-0.13
POSITIVE LOGITS
initially
0.22
Initially
0.19
Initially
0.19
наÑĩ
0.17
>",
0.16
rig
0.15
opher
0.14
uary
0.14
enna
0.14
/release
0.14
Activations Density 0.058%