INDEX
Explanations
adjectives related to characteristics or qualities
references to various forms of written content or narratives
New Auto-Interp
Negative Logits
xtap
-0.67
Seventh
-0.65
sugg
-0.61
Phill
-0.60
Branch
-0.59
Pru
-0.58
verty
-0.57
å¼
-0.57
529
-0.57
Tick
-0.56
POSITIVE LOGITS
nonetheless
1.10
itself
0.98
anyway
0.78
ourselves
0.77
anyways
0.76
oneself
0.76
alike
0.75
nesses
0.75
ain
0.74
herself
0.73
Activations Density 0.414%