INDEX
Explanations
instances of actions and consequences within various unfamiliar texts
New Auto-Interp
Negative Logits
endif
-0.62
ciating
-0.57
fortunately
-0.54
depending
-0.53
assuming
-0.53
erm
-0.52
gradation
-0.51
Depending
-0.50
fter
-0.49
Air
-0.49
POSITIVE LOGITS
illegal
0.61
sake
0.60
improper
0.57
purposes
0.50
tein
0.50
insensitive
0.49
nonviolent
0.48
wrongly
0.48
ãĤ¢ãĥ«
0.48
insufficient
0.47
Activations Density 11.308%