INDEX
Explanations
sentences that express existence or presence of entities and their states
New Auto-Interp
Negative Logits
stairs
-0.69
Pants
-0.69
Ambro
-0.65
ingham
-0.64
Newsletter
-0.63
TRUMP
-0.63
discont
-0.60
Wonderful
-0.59
Nicarag
-0.58
Mesh
-0.58
POSITIVE LOGITS
wont
0.90
evidenced
0.78
actionGroup
0.78
often
0.73
attest
0.70
witnessed
0.69
previously
0.69
tend
0.68
tends
0.67
ãĥĩãĤ£
0.66
Activations Density 0.129%