INDEX
Explanations
expressions of sentiment or characterization related to interpersonal relationships
New Auto-Interp
Negative Logits
OK
-0.15
mates
-0.14
alu
-0.14
consort
-0.14
citizenship
-0.14
OK
-0.14
Claus
-0.14
imperson
-0.14
mug
-0.14
inhib
-0.14
POSITIVE LOGITS
pha
0.16
cia
0.16
ande
0.15
ả
0.15
Wick
0.15
etta
0.14
screens
0.14
Gard
0.14
wick
0.14
NG
0.14
Activations Density 0.009%