INDEX
Explanations
words related to stability or a lack of emotion
references to balance or balancing concepts
New Auto-Interp
Negative Logits
printf
-0.82
rawl
-0.82
deposition
-0.73
thood
-0.71
ribe
-0.71
-0.69
Expand
-0.66
iblings
-0.65
XP
-0.65
Profession
-0.64
POSITIVE LOGITS
bal
4.01
Bal
2.17
Bal
1.90
bal
1.66
BAL
1.11
ballet
1.07
balancing
1.00
sto
1.00
sab
0.99
peac
0.96
Activations Density 0.029%