INDEX
Explanations
references to contradiction and complexity in human relationships
New Auto-Interp
Negative Logits
unfortunately
-0.19
Unfortunately
-0.19
Sadly
-0.16
sadly
-0.16
Unfortunately
-0.16
çIJ´
-0.14
Geile
-0.14
Dabei
-0.14
Sadly
-0.14
uze
-0.14
POSITIVE LOGITS
Still
0.68
still
0.64
Still
0.62
Nevertheless
0.61
Nonetheless
0.59
nevertheless
0.58
nonetheless
0.58
Nevertheless
0.58
STILL
0.55
still
0.52
Activations Density 0.348%