INDEX
Explanations
Twitter usernames
underscore characters or special formatting
New Auto-Interp
Negative Logits
atform
-0.74
Levine
-0.73
Holding
-0.69
Frazier
-0.69
Pearce
-0.69
FML
-0.68
Manson
-0.68
Ago
-0.68
quished
-0.68
Stef
-0.67
POSITIVE LOGITS
default
1.06
dict
1.02
chance
1.00
blank
0.99
EStreamFrame
0.97
tro
0.95
events
0.94
gradient
0.94
token
0.92
delay
0.92
Activations Density 0.024%