INDEX
Explanations
sentences with strong conclusions or assertions
New Auto-Interp
Negative Logits
Tahoe
-0.17
erna
-0.15
Ranch
-0.15
lej
-0.14
Clemson
-0.14
rai
-0.14
Purdue
-0.14
Baylor
-0.14
Californ
-0.14
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
-0.14
POSITIVE LOGITS
Brooklyn
0.42
NYC
0.38
NYPD
0.38
NY
0.38
NY
0.35
Staten
0.35
borough
0.34
Queens
0.34
Borough
0.33
Bronx
0.31
Activations Density 0.000%