INDEX
Explanations
phrases related to commenting on individual cases and policies
New Auto-Interp
Negative Logits
EEDED
-0.15
issan
-0.14
entai
-0.14
orton
-0.14
aptop
-0.13
λοι
-0.13
interes
-0.13
ãĥ¼ãĥ¼
-0.13
inic
-0.12
Âį
-0.12
POSITIVE LOGITS
specific
0.40
specifics
0.38
specific
0.32
individual
0.32
-specific
0.31
especÃŃf
0.31
åħ·ä½ĵ
0.30
Specific
0.29
Specific
0.28
ongoing
0.26
Activations Density 0.056%