INDEX
Explanations
phrases indicating feelings of guilt or accusations of wrongdoing
New Auto-Interp
Negative Logits
TRACE
-0.08
erli
-0.08
Trace
-0.08
bay
-0.08
æĤł
-0.07
лож
-0.07
ë³
-0.07
è½
-0.07
Trace
-0.07
ÑĥлÑĥÑĩ
-0.07
POSITIVE LOGITS
catch
0.06
both
0.06
islav
0.06
ilver
0.06
definition
0.06
catch
0.06
mith
0.06
cher
0.06
revis
0.06
opyright
0.05
Activations Density 0.002%