INDEX
Explanations
phrases related to legal and investigative actions, cautionary statements, and negative assertions about honesty and integrity
warnings or cautions related to legal or social issues
New Auto-Interp
Negative Logits
ggles
-0.74
ortium
-0.72
ishable
-0.67
Pinball
-0.66
ibaba
-0.66
inav
-0.66
escent
-0.63
eworld
-0.63
undrum
-0.62
erenn
-0.62
POSITIVE LOGITS
unfairly
0.75
outwe
0.74
ineffective
0.73
misinterpret
0.72
instead
0.67
entimes
0.67
Coun
0.66
instead
0.66
retribution
0.65
Instead
0.64
Activations Density 1.771%