INDEX
Explanations
the word "no" with a high activation value
phrases expressing the concept of "none" or "no."
New Auto-Interp
Negative Logits
RAFT
-0.72
endar
-0.72
towed
-0.70
adobe
-0.66
bombed
-0.65
reborn
-0.64
tossed
-0.62
Lear
-0.60
Cutter
-0.60
hone
-0.58
POSITIVE LOGITS
xious
1.14
oses
1.00
except
0.99
longer
0.95
doubt
0.90
avail
0.87
oooooooo
0.86
whatsoever
0.83
obs
0.81
oooooooooooooooo
0.81
Activations Density 0.087%