INDEX
Explanations
instances of destruction or harmful actions involving fire
New Auto-Interp
Negative Logits
Fou
-0.16
Folk
-0.15
èħ
-0.15
Fraction
-0.14
Foo
-0.14
Fauc
-0.14
Flush
-0.14
fallback
-0.14
Fu
-0.14
Fool
-0.14
POSITIVE LOGITS
fire
1.03
fire
0.83
Fire
0.83
-fire
0.81
Fire
0.78
_fire
0.70
fires
0.70
.fire
0.70
FIRE
0.69
çģ«
0.68
Activations Density 0.117%