INDEX
Explanations
emphasized terms or key phrases that indicate strong emotional or evaluative language
New Auto-Interp
Negative Logits
EMA
-0.18
rome
-0.17
ema
-0.17
kening
-0.16
oad
-0.16
Gree
-0.15
itudes
-0.15
ILA
-0.15
æ³Ľ
-0.15
Insensitive
-0.14
POSITIVE LOGITS
Th
0.15
0.15
orum
0.14
Rout
0.14
drift
0.14
aptic
0.14
ied
0.14
fing
0.14
dirt
0.14
iffe
0.13
Activations Density 0.005%