INDEX
Explanations
positions of authority and their actions or statements
New Auto-Interp
Negative Logits
fw
-0.64
ãģį
-0.63
ankind
-0.60
Compar
-0.59
ãģı
-0.55
Condition
-0.55
ãĥ¼ãĥ³
-0.54
Tokens
-0.54
Required
-0.52
Reason
-0.52
POSITIVE LOGITS
Rahul
0.79
warns
0.78
Joined
0.75
Says
0.74
extraord
0.73
Gina
0.71
admits
0.71
announces
0.71
says
0.70
denies
0.69
Activations Density 0.217%