INDEX
Explanations
phrases expressing surprise or amazement
the phrase "I have never" indicating experiences that have not occurred
New Auto-Interp
Negative Logits
uctions
-0.67
HP
-0.66
Reports
-0.64
Notes
-0.63
Copy
-0.61
Journals
-0.61
åĤ
-0.60
Towards
-0.59
DIT
-0.58
Pieces
-0.58
POSITIVE LOGITS
theless
1.10
been
1.08
been
1.05
existed
0.95
EVER
0.93
tasted
0.87
Been
0.86
seen
0.83
heard
0.83
bothered
0.82
Activations Density 0.041%