INDEX
Explanations
phrases that indicate transformation or change
New Auto-Interp
Negative Logits
maxHeight
-0.15
101
-0.15
ää
-0.15
agoon
-0.14
past
-0.14
ãĥł
-0.14
uktur
-0.14
ovid
-0.14
emade
-0.14
upal
-0.13
POSITIVE LOGITS
aram
0.16
ÃŃd
0.16
áno
0.15
ARAM
0.14
EMA
0.14
æ±Ĥè´Ń
0.14
tail
0.14
ects
0.13
mos
0.13
ãģĵãĤį
0.13
Activations Density 0.058%