INDEX
Explanations
the word "possible" followed by a high activation number
phrases that express possibilities or potential outcomes
New Auto-Interp
Negative Logits
bane
-0.79
cipl
-0.77
bey
-0.77
phas
-0.75
waters
-0.70
zac
-0.70
ophone
-0.70
eye
-0.67
roma
-0.67
cig
-0.66
POSITIVE LOGITS
¶
0.77
FORE
0.73
NK
0.70
exit
0.70
terday
0.69
ossibility
0.65
partName
0.65
understatement
0.64
lly
0.62
nevertheless
0.61
Activations Density 0.021%