INDEX
Explanations
the word "one"
references to singular instances or concepts
New Auto-Interp
Negative Logits
ulner
-0.68
folk
-0.68
ooks
-0.67
inders
-0.66
lain
-0.66
emies
-0.66
older
-0.65
Leaks
-0.63
hips
-0.63
ypes
-0.62
POSITIVE LOGITS
hundred
0.92
Piece
0.78
sided
0.76
Hundred
0.75
particular
0.72
playthrough
0.71
IDA
0.71
hour
0.70
million
0.69
minute
0.68
Activations Density 0.057%