INDEX
Explanations
themes related to interpersonal relationships and emotional complexity
New Auto-Interp
Negative Logits
maf
-0.14
usto
-0.14
092
-0.14
Circle
-0.13
aml
-0.13
ijing
-0.13
RefCount
-0.12
utches
-0.12
åĪĻ
-0.12
raig
-0.12
POSITIVE LOGITS
anyway
0.94
Anyway
0.83
anyways
0.82
Anyway
0.78
anyhow
0.74
any
0.39
toch
0.38
any
0.38
nevertheless
0.36
nonetheless
0.36
Activations Density 0.652%