INDEX
Explanations
mentions of plans, intentions, or future actions
New Auto-Interp
Negative Logits
tut
-0.17
Spot
-0.16
Closure
-0.15
|_|
-0.14
onomy
-0.14
بÙĩا
-0.14
iants
-0.14
ambio
-0.14
ÑĢÑı
-0.14
ReadWrite
-0.14
POSITIVE LOGITS
work
0.27
worked
0.23
closely
0.23
worked
0.22
work
0.21
brief
0.21
shortly
0.20
monitor
0.20
.work
0.20
continue
0.19
Activations Density 0.137%