INDEX
Explanations
phrases with the word "of" followed by highly activating words
phrases indicating possession or inclusion
New Auto-Interp
Negative Logits
dayName
-0.80
condem
-0.68
eele
-0.65
disposed
-0.63
uterte
-0.62
illac
-0.61
ettel
-0.61
subst
-0.60
gee
-0.60
uca
-0.60
POSITIVE LOGITS
THING
0.81
sorts
0.78
ahu
0.74
sudden
0.71
goddamn
0.71
together
0.71
imaginable
0.70
course
0.68
ources
0.67
important
0.66
Activations Density 0.080%