INDEX
Explanations
phrases related to communication and interaction between individuals
language related to conflict and accusations
New Auto-Interp
Negative Logits
iasm
-0.74
erning
-0.72
xtap
-0.72
ctory
-0.69
ornings
-0.67
cour
-0.65
¯¯¯¯¯¯¯¯
-0.64
though
-0.64
empl
-0.63
ayers
-0.63
POSITIVE LOGITS
yours
0.96
..."
0.94
â̦"
0.92
!'"
0.91
â̦"
0.89
.'"
0.88
______
0.86
ours
0.84
somebody
0.81
..."
0.80
Activations Density 0.753%