INDEX
Explanations
verbs and phrases related to decision-making and actions taken
New Auto-Interp
Negative Logits
anners
-0.17
LENG
-0.17
lav
-0.15
Mane
-0.15
TU
-0.14
urm
-0.14
anner
-0.13
pac
-0.13
lation
-0.13
Laur
-0.13
POSITIVE LOGITS
åīĽ
0.15
295
0.15
instead
0.15
iline
0.14
wise
0.14
instead
0.14
uka
0.14
IR
0.14
rather
0.14
SCO
0.13
Activations Density 0.136%