INDEX
Explanations
phrases related to deception and manipulation
New Auto-Interp
Negative Logits
OGND
-0.45
autorytatywna
-0.35
sécher
-0.32
esperanza
-0.30
Slaughter
-0.28
symp
-0.28
ttp
-0.28
пры
-0.28
__(/*!
-0.28
espoir
-0.27
POSITIVE LOGITS
report
0.71
ReusableCell
0.66
report
0.65
Report
0.63
報告
0.62
REPORT
0.62
&___
0.61
汇报
0.60
Report
0.60
presentation
0.58
Activations Density 0.337%