INDEX
Explanations
keywords signaling the reader to read additional content
instances of the word "READ" followed by numerical values
New Auto-Interp
Negative Logits
phi
-0.66
angel
-0.66
asketball
-0.66
afort
-0.64
oteric
-0.64
unity
-0.64
basketball
-0.63
ugi
-0.63
aviour
-0.62
venge
-0.61
POSITIVE LOGITS
READ
1.02
aloud
0.99
READ
0.94
ALSO
0.91
MORE
0.88
WRITE
0.84
WATCHED
0.83
NESS
0.82
MORE
0.82
TY
0.81
Activations Density 0.006%