INDEX
Explanations
food items and locations
lists or categorizations of items and activities
New Auto-Interp
Negative Logits
exha
-0.67
misunder
-0.65
interstitial
-0.65
Reward
-0.65
¬¼
-0.61
:(
-0.59
enthusi
-0.58
iple
-0.58
quist
-0.57
bishop
-0.57
POSITIVE LOGITS
huh
1.11
etc
1.04
albeit
1.01
alas
0.97
eh
0.91
however
0.91
preferably
0.89
meanwhile
0.88
though
0.84
yes
0.82
Activations Density 0.754%