INDEX
Explanations
adjectives describing intensity or extremity
extensive use of the word "so" to express strong emphasis or degree
New Auto-Interp
Negative Logits
works
-0.70
theless
-0.69
nings
-0.67
ulia
-0.63
amac
-0.63
excerpts
-0.61
eviction
-0.61
Flavoring
-0.60
coincides
-0.60
prompts
-0.59
POSITIVE LOGITS
bered
1.15
ooo
1.14
oooo
1.13
oths
1.05
oooooooo
1.02
othes
0.97
oooooooooooooooo
0.95
far
0.89
othe
0.86
othing
0.86
Activations Density 0.068%