INDEX
Explanations
names and dates in a specific format
mentions of specific individuals or entities
New Auto-Interp
Negative Logits
Takeru
-0.80
Classification
-0.73
Passage
-0.72
Triangle
-0.69
Perception
-0.66
ACTION
-0.66
accommodations
-0.66
DIRECT
-0.65
tampering
-0.65
CONC
-0.65
POSITIVE LOGITS
atalie
0.81
ablo
0.81
fle
0.79
erk
0.78
ename
0.77
punk
0.75
eor
0.74
addon
0.74
official
0.73
arling
0.73
Activations Density 0.093%