INDEX
Explanations
expressions indicating potential change or transition
New Auto-Interp
Negative Logits
viar
-0.15
eba
-0.15
validated
-0.15
agara
-0.15
egot
-0.15
adera
-0.14
Cube
-0.14
Bear
-0.14
edo
-0.14
INU
-0.13
POSITIVE LOGITS
remedy
0.21
shaw
0.18
rect
0.18
remedies
0.17
remed
0.17
changed
0.17
angl
0.15
Теп
0.15
endif
0.14
changed
0.14
Activations Density 0.164%