INDEX
Explanations
verbs indicating communication, intention, and assertion
phrases that denote indication, suggestion, or signaling information
New Auto-Interp
Negative Logits
ctors
-0.86
fare
-0.85
@#&
-0.80
zanne
-0.72
ÄŁ
-0.71
sites
-0.70
neys
-0.70
ney
-0.68
ps
-0.67
vas
-0.67
POSITIVE LOGITS
indications
0.90
signs
0.89
ered
0.83
indication
0.79
Signs
0.77
ively
0.72
otherwise
0.72
indicated
0.72
indicates
0.71
hints
0.71
Activations Density 0.038%