INDEX
Explanations
phrases indicating examples or instances of something
phrases that introduce examples or instances of various concepts
New Auto-Interp
Negative Logits
antage
-0.69
ushima
-0.69
orem
-0.64
allo
-0.64
ribution
-0.64
emi
-0.62
emale
-0.61
ogun
-0.60
enger
-0.60
ombat
-0.59
POSITIVE LOGITS
ties
0.86
cond
0.72
embodiments
0.67
ities
0.64
minded
0.64
amount
0.64
ones
0.61
inyl
0.61
things
0.61
minded
0.59
Activations Density 0.033%