INDEX
Explanations
phrases related to future intentions or strategies
New Auto-Interp
Negative Logits
anca
-0.68
int
-0.67
weed
-0.67
inton
-0.66
tha
-0.60
clad
-0.60
idge
-0.60
court
-0.60
aff
-0.60
cius
-0.59
POSITIVE LOGITS
Parenthood
1.12
paren
0.78
accordingly
0.75
¶
0.73
obs
0.72
escape
0.71
pregnancies
0.69
§
0.68
ÄŁ
0.68
regret
0.66
Activations Density 10.702%