INDEX
Explanations
statements indicating a return to a previous or original state
references to recovery or restoration processes
New Auto-Interp
Negative Logits
nikov
-0.72
andals
-0.70
igated
-0.64
bureaucracy
-0.63
udos
-0.63
divor
-0.62
neys
-0.62
llo
-0.61
hoe
-0.61
anecdotes
-0.61
POSITIVE LOGITS
equilibrium
1.22
position
1.19
optimum
1.08
shape
1.03
shape
1.02
optimal
0.99
normal
0.98
normal
0.98
Position
0.98
Pos
0.95
Activations Density 0.296%