INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
fam
-0.83
reshold
-0.80
iffe
-0.80
fp
-0.75
ACC
-0.75
utter
-0.73
rha
-0.71
imp
-0.68
CF
-0.68
antage
-0.67
POSITIVE LOGITS
spotting
0.75
ãĥ¬
0.73
infiltration
0.72
ndra
0.70
advising
0.67
hostilities
0.67
Sharing
0.66
Sense
0.65
Lighting
0.64
diplomacy
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.