INDEX
Explanations
physical struggle and sensation
New Auto-Interp
Negative Logits
hand
-0.09
æĬ±
-0.09
pa
-0.09
wner
-0.09
PG
-0.09
Kub
-0.09
collapse
-0.09
éľ
-0.09
Quar
-0.08
Wing
-0.08
POSITIVE LOGITS
gag
0.17
bound
0.15
struggles
0.15
Bound
0.13
struggle
0.12
Shack
0.12
Bound
0.12
struggled
0.12
-bound
0.12
bound
0.12
Activations Density 0.028%