INDEX
Explanations
instances of explosive or sudden changes in context or emotion
New Auto-Interp
Negative Logits
nez
-0.16
etchup
-0.16
amb
-0.16
inger
-0.15
ROTO
-0.15
دارÛĮ
-0.15
ãĥ³ãĥĦ
-0.14
ibe
-0.14
inati
-0.14
umb
-0.14
POSITIVE LOGITS
burst
0.19
bursts
0.18
743
0.17
/exp
0.17
Burst
0.17
agen
0.16
burst
0.16
AWN
0.16
yster
0.16
myth
0.15
Activations Density 0.074%