INDEX
Explanations
phrases indicating setting objectives or goals
phrases indicating the intention or purpose of actions
New Auto-Interp
Negative Logits
eries
-0.65
ilege
-0.62
antha
-0.61
ery
-0.61
outage
-0.60
atching
-0.60
avorable
-0.60
essee
-0.59
interstitial
-0.59
leness
-0.59
POSITIVE LOGITS
anew
0.87
fitted
0.85
posts
0.73
gow
0.68
llor
0.65
tracks
0.64
AAF
0.64
Goal
0.64
¯¯¯¯¯¯¯¯
0.64
ngth
0.63
Activations Density 0.047%