INDEX
Explanations
the word "Beck" with varied activation levels
references to the name "Beck."
New Auto-Interp
Negative Logits
ntil
-0.88
ocre
-0.80
Âł Âł Âł Âł Âł Âł Âł Âł
-0.71
unnecess
-0.70
rely
-0.68
ACTION
-0.67
ricular
-0.65
âĶĢâĶĢâĶĢâĶĢ
-0.62
ctic
-0.61
AMES
-0.61
POSITIVE LOGITS
ett
1.08
etts
1.08
erman
1.07
Beck
1.07
mann
1.01
stra
0.99
oning
0.98
ford
0.98
enstein
0.98
strap
0.97
Activations Density 0.029%