INDEX
Explanations
references to specific entities, such as companies, countries, teams, players, or games
mentions of entities such as companies, countries, teams, and games
New Auto-Interp
Negative Logits
ories
-0.67
puppies
-0.64
trophies
-0.64
elight
-0.63
Mans
-0.62
passports
-0.62
bolts
-0.62
latest
-0.61
Kid
-0.61
etsk
-0.61
POSITIVE LOGITS
Called
0.94
ogram
0.78
alyst
0.76
onym
0.73
atical
0.72
ribe
0.71
whose
0.70
Named
0.69
scientist
0.69
ulent
0.69
Activations Density 0.361%