INDEX
Explanations
phrases relating to first-hand experiences
New Auto-Interp
Negative Logits
arter
-0.81
ramid
-0.74
phis
-0.73
oleon
-0.73
issa
-0.73
itary
-0.72
pty
-0.72
prime
-0.71
Puzzle
-0.69
yip
-0.69
POSITIVE LOGITS
knowledge
0.94
testimonies
0.93
firsthand
0.91
experience
0.90
testimony
0.88
witness
0.87
experiences
0.86
witnessing
0.86
Witness
0.86
eyewitness
0.84
Activations Density 0.026%