INDEX
Explanations
emotional states and interpersonal dynamics
New Auto-Interp
Negative Logits
riott
-0.17
rud
-0.16
obst
-0.15
ahan
-0.14
iou
-0.14
mitt
-0.14
mocker
-0.14
plan
-0.14
AMENT
-0.14
bern
-0.14
POSITIVE LOGITS
nik
0.17
dó
0.15
etas
0.14
bows
0.14
defer
0.14
coll
0.14
Hooks
0.14
teslim
0.14
succ
0.14
uffer
0.14
Activations Density 0.220%