INDEX
Explanations
instances where something is described or represented in a particular way
phrases that represent or correlate with interpretations or portrayals of situations
New Auto-Interp
Negative Logits
asers
-0.72
eor
-0.71
quist
-0.70
CHA
-0.68
romy
-0.68
ppa
-0.67
rone
-0.66
reprene
-0.66
ersen
-0.65
ctl
-0.64
POSITIVE LOGITS
follows
1.18
pires
1.01
criptions
0.94
pired
0.92
phy
0.92
well
0.91
piring
0.87
belonging
0.86
opposed
0.86
occurring
0.83
Activations Density 0.156%