INDEX
Explanations
words and phrases related to responsibility and action in a structured context
New Auto-Interp
Negative Logits
bookmark
-0.16
jak
-0.16
esel
-0.14
iid
-0.14
berg
-0.14
egal
-0.14
hower
-0.14
eg
-0.14
aded
-0.14
icha
-0.14
POSITIVE LOGITS
UME
0.15
rie
0.14
BILE
0.14
åĽ
0.14
DeV
0.13
ume
0.13
Nem
0.13
VEC
0.13
izes
0.13
ige
0.13
Activations Density 0.166%