INDEX
Explanations
phrases related to interactions and prompts for verification or confirmation
New Auto-Interp
Head Attr Weights
0:0.17
1:0.02
2:0.17
3:0.18
4:0.03
5:0.06
6:0.03
7:0.09
8:0.08
9:0.01
10:0.08
11:0.02
Negative Logits
Tend
-2.10
negotiators
-2.01
hunt
-2.01
deliberations
-2.00
oranges
-1.93
wives
-1.92
Leaders
-1.90
agendas
-1.89
priorit
-1.84
Directors
-1.84
POSITIVE LOGITS
login
2.40
registered
2.19
raud
2.18
Error
2.17
serial
2.09
error
2.08
Error
2.08
2.07
verification
2.06
Login
2.06
Activations Density 0.006%