INDEX
Explanations
phrases related to instructions or commands
specific commands or instructions related to actions
New Auto-Interp
Negative Logits
ageing
-0.72
citiz
-0.66
nails
-0.65
xual
-0.64
thieves
-0.63
aging
-0.62
escaping
-0.61
wolves
-0.57
diving
-0.57
judgement
-0.55
POSITIVE LOGITS
³³³
0.99
³³³³³³³³
0.97
³³³³
0.93
³³³³³³³³³³³³³³³³
0.91
³³
0.72
-.
0.70
ults
0.66
vious
0.63
îĢ
0.63
ORD
0.62
Activations Density 0.432%